SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
Speeding up
Networking
Van Jacobson
van@packetdesign.com

Bob Felderman
feldy@precisionio.com

Precision I/O

Linux.conf.au 2006
Dunedin, NZ
This talk is not about ‘fixing’
the Linux networking stack
The Linux networking stack isn’t broken.
care of
• The people who takedo goodthe stack know
what they’re doing &
work.
on
• Based has all the measurements I’m aware of, of
Linux
the fastest & most complete stack
any OS.
LCA06 - Jan 27, 2006 - Jacobson / Felderman

2
This talk is about fixing
an architectural problem
created a long time ago
in a place far, far away. . .
LCA06 - Jan 27, 2006 - Jacobson / Felderman

3
LCA06 - Jan 27, 2006 - Jacobson / Felderman

4
In the beginning . . .
ARPA created MULTICS
• First OS networking stack (MIT, 1970)
multi-user ‘super-computer’
• Ran on a @ 0.4 MIPS)
(GE-640
than100
• Rarely feweruser. users; took ~2 minutes
to page in a

• Since ARPAnet performance depended only
on how fast host could empty its 6 IMP
buffers, had to put stack in kernel.
LCA06 - Jan 27, 2006 - Jacobson / Felderman

4
The Multics stack begat
many other stacks . . .
• First TCP/IP stack done on Multics (1980)
went to
• People from that projectBerkeley BBN to
do first TCP/IP stack for
Unix
(1983).
used BBN stack as
• Berkeley CSRGfor 4.1c BSD stack (1985).
functional spec
University of
• CSRG wins long battle withstack source
California lawyers & makes
available under ‘BSD copyright’ (1987).
LCA06 - Jan 27, 2006 - Jacobson / Felderman

5
Multics architecture, as
elaborated by Berkeley,
became ‘Standard Model’
Interrupt level

Task level

System
ISR
packet

LCA06 - Jan 27, 2006 - Jacobson / Felderman

Softint
skb

Application
Socket

read()

byte stream

6
“The way we’ve always done it”
is not necessarily the same as
“the right way to do it”

There are a lot of problems associated
with this style of implementation . . .

LCA06 - Jan 27, 2006 - Jacobson / Felderman

7
Protocol Complication
• Since data is received by destination
kernel, “window” was added to distinguish
between “data has arrived” & “data was
consumed”.
addition more than triples the size
• Thisprotocol (window probes, persist of
the
states) and is responsible for at least half
the interoperability issues (Silly Window
Syndrome, FIN wars, etc.)
LCA06 - Jan 27, 2006 - Jacobson / Felderman

8
Internet Stability
• You can view a network connection as a
servo-loop:
S

R

• A kernel-based protocol implementation
converts this to two coupled loops:
•

S
K
R
A very general theorem (Routh-Hurwitz)
says that the two coupled loops will always
be less stable then one.

also hides the receiving
• The kernel loopthe sender which screws app
dynamics from
up
the RTT estimate & causes spurious
retransmissions.
LCA06 - Jan 27, 2006 - Jacobson / Felderman

9
Compromises
Even for a simple stream abstraction like TCP,
there’s no such thing as a “one size fits all”
protocol implementation.

• The packetization and send strategies are
completely different for bulk data vs.
transactions vs. event streams.
ack strategies are completely
• Thestreaming vs. request response.different
for
Some of this can be handled with sockopts
but some app / kernel implementation
mismatch is inevitable.
LCA06 - Jan 27, 2006 - Jacobson / Felderman

10
Performance
(the topic for the rest of this talk)
have
• Kernel-based implementations oftenuser).
extra data copies (packet to skb to

• Kernel-based implementations often have
extra boundary crossings (hardware
interrupt to software interrupt to context
switch to syscall return).

• Kernel-based implementations often have
lock contention and hotspots.
LCA06 - Jan 27, 2006 - Jacobson / Felderman

11
Why should we care?
gotten fast
• Networking gear hasenough ($10enough8
(10Gb/s) and cheap
for an
port Gb switch) that it’s changing from a
communications technology to a backplane
technology.

• The huge mismatch between processor
clock rate & memory latency has forced
chip makers to put multiple cores on a die.

LCA06 - Jan 27, 2006 - Jacobson / Felderman

12
Why multiple cores?
• Vanilla 2GHz P4 issues 2-4 instr / clock
 4-8 instr / ns.

• Internal structure of DRAM chip makes
cache line fetch take 50-100ns (FSB speed
doesn’t matter).
did 400 instructions of computing
• If you cache line, system would be 50% on
every
efficient with one core & 100% with two.
more like 20
/ line
• Typical number iswith one core instrcores
or 2.5% efficient
(20
for 100%).
LCA06 - Jan 27, 2006 - Jacobson / Felderman

13
Good system performance
comes from having lots of
cores working independently
• This is the canonical Internet problem.
is called the “end-to-end
• The solutionsays you should push all work
principle”. It
to the edge & do the absolute minimum
inside the net.

LCA06 - Jan 27, 2006 - Jacobson / Felderman

14
The end of the wire isn’t
the end of the net
On a uni-processor it doesn’t matter but on a
multi-processor the protocol work should be done
on the processor that’s going to consume the data.
This means ISR & Softint should
()
do almost nothing and Socket
ead
r
t
should do everything.
cke
So
ISR

Softint

Socket

read()

S oc

ket
rea
d

()

LCA06 - Jan 27, 2006 - Jacobson / Felderman

15
How good is the stack at
spreading out the work?
Let’s look at some

LCA06 - Jan 27, 2006 - Jacobson / Felderman

16
Test setup
Dell Poweredge 1750s (2.4GHz P4 Xeon,
• Two processor, hyperthreading off) hooked up
dual
back-to-back via Intel e1000 gig ether cards.

2.6.15
• Running stock(6.3.9). plus current Sourceforge
e1000 driver
with oprofile
• Measurements done runs. Showing 0.9.1. Each 5.
test was 5 5-minute
median of

• booted with idle=poll_idle. Irqbalance off.
LCA06 - Jan 27, 2006 - Jacobson / Felderman

17
15

Digression: comparing
two profiles
●

e1000_intr
●

10

●

e1000_clean

●

●

__copy_user_intel

5

schedule

__switch_to
● ● tcp_v4_rcv
●

0

device interrupt & app on different cpu

_raw_spin_lock

●
●
●
● ● ●●
● ●
● ● ●
● ● ●●
● ● ●●
●
●
● ● ●●●
●●
●
● ●
●
●● ●
●●
●●
●● ● ●
●● ●●●
●
●
●●
●● ●
●
●
●
●● ●
●●
●●
●●
●●
●●
●

0

●
●

5

10

15

device interrupt & app on same cpu

LCA06 - Jan 27, 2006 - Jacobson / Felderman

18
Uni vs. dual processor
(netperf) with cpu
• 1cpu: run netservercpu as e1000 interrupts.
affinity set to same
netserver with
• 2cpu: runcpu from e1000 cpu affinity set to
different
interrupts.

(%) Busy
1cpu 50
2cpu 77

Intr

7
9

LCA06 - Jan 27, 2006 - Jacobson / Felderman

Softint Socket Locks Sched

11
13

16
24

8
14

5
12

App

1
1
19
Uni vs. dual processor
(netperf) with cpu
• 1cpu: run netservercpu as e1000 interrupts.
affinity set to same
netserver with
• 2cpu: runcpu from e1000 cpu affinity set to
different
interrupts.

(%) Busy
1cpu 50
2cpu 77

Intr

7
9

LCA06 - Jan 27, 2006 - Jacobson / Felderman

Softint Socket Locks Sched

11
13

16
24

8
14

5
12

App

1
1
19
This is just Amdahl’s law
in action
When adding additional processors:

• Benefit (cycles to do work) grows at most
linearly.
competition,
• Cost (contention, grows quadratically.
serialization, etc.)

•

System capacity goes as C(n) = an - bn2
For big enough n, the quadratic always wins.

• The key to good scaling is to minimize b.
LCA06 - Jan 27, 2006 - Jacobson / Felderman

20
Locking destroys
performance two ways
has multiple writers
• The lock(fabulously expensive) so each has
to do a
RFO cache
cycle.
update which
• The lock requires an atomicthe cache.
is implemented by freezing
To go fast you want to have a single writer
per line and no locks.

LCA06 - Jan 27, 2006 - Jacobson / Felderman

21
• Networking involves a lot of queues.
They’re often implented as doubly linked
lists:
the
child
• This isuser poster write for cache thrashing.
Every
has to
every line and
every change has to be made multiple
places.
network
• Since most consumercomponents have a
producer /
relationship, a lock
free fifo can work a lot better.

LCA06 - Jan 27, 2006 - Jacobson / Felderman

22
net_channel - a cache aware,
cache friendly queue
typedef struct {
uint16_t
tail;
uint8_t
wakecnt;
uint8_t
pad;
} net_channel_producer_t;
typedef struct {
uint16_t
head;
uint8_t
wakecnt;
uint8_t
wake_type;
void*
wake_arg;
} net_channel_consumer_t;
struct {

/* next element to add */
/* do wakeup if != consumer wakecnt */

/*
/*
/*
/*

next element to remove */
increment to request wakeup */
how to wakeup consumer */
opaque argument to wakeup routine */

net_channel_producer_t p CACHE_ALIGN;
uint32_t q[NET_CHANNEL_Q_ENTRIES];
net_channel_consumer_t c;
} net_channel_t ;

LCA06 - Jan 27, 2006 - Jacobson / Felderman

/* producer's header */
/* consumer's header */

23
net_channel (cont.)
#define NET_CHANNEL_ENTRIES 512 /* approx number of entries in channel q */
#define NET_CHANNEL_Q_ENTRIES 
((ROUND_UP(NET_CHANNEL_ENTRIES*sizeof(uint32_t),CACHE_LINE_SIZE) 
- sizeof(net_channel_producer_t) - sizeof(net_channel_consumer_t)) 
/ sizeof(uint32_t))
#define CACHE_ALIGN __attribute__((aligned(CACHE_LINE_SIZE)))
static inline void net_channel_queue(net_channel_t *chan, uint32_t item) {
uint16_t tail = chan->p.tail;
uint16_t nxt = (tail + 1) % NET_CHANNEL_Q_ENTRIES;
if (nxt != chan->c.head) {
chan->q[tail] = item;
STORE_BARRIER;
chan->p.tail = nxt;
if (chan->p.wakecnt != chan->c.wakecnt) {
++chan->p.wakecnt;
net_chan_wakeup(chan);
}
}
}
LCA06 - Jan 27, 2006 - Jacobson / Felderman

24
“Channelize” driver
• Remove e1000 driver hard_start_xmit &in
napi_poll routines. No softint code left
driver & no skb’s (driver deals only in packets).

• Send packets to generic_napi_poll via a net
channel.
(%) Busy
1cpu 50
2cpu 77
drvr 58

Intr

7
9
6

LCA06 - Jan 27, 2006 - Jacobson / Felderman

Softint Socket Locks Sched

11
13
12

16
24
16

8
14
9

5
12
9

App

1
1
1
25
“Channelize” driver
• Remove e1000 driver hard_start_xmit &in
napi_poll routines. No softint code left
driver & no skb’s (driver deals only in packets).

• Send packets to generic_napi_poll via a net
channel.
(%) Busy
1cpu 50
2cpu 77
drvr 58

Intr

7
9
6

LCA06 - Jan 27, 2006 - Jacobson / Felderman

Softint Socket Locks Sched

11
13
12

16
24
16

8
14
9

5
12
9

App

1
1
1
25
“Channelize” socket
“registers” transport signature with
• socket on “accept()”. Gets back a channel.
driver
with matching
• driver drops all packetschannel & wakes app
signature into socket’s
if sleeping in socket code. Socket code
processes packet(s) on wakeup.

(%)
1cpu
2cpu
drvr
sock

Busy

Intr

50
77
58
28

7
9
6
6

LCA06 - Jan 27, 2006 - Jacobson / Felderman

Softint Socket Locks Sched

11
13
12
0

16
24
16
16

8
14
9
1

5
12
9
3

App

1
1
1
1
26
“Channelize” socket
“registers” transport signature with
• socket on “accept()”. Gets back a channel.
driver
with matching
• driver drops all packetschannel & wakes app
signature into socket’s
if sleeping in socket code. Socket code
processes packet(s) on wakeup.

(%)
1cpu
2cpu
drvr
sock

Busy

Intr

50
77
58
28

7
9
6
6

LCA06 - Jan 27, 2006 - Jacobson / Felderman

Softint Socket Locks Sched

11
13
12
0

16
24
16
16

8
14
9
1

5
12
9
3

App

1
1
1
1
26
“Channelize” socket
“registers” transport signature with
• socket on “accept()”. Gets back a channel.
driver
with matching
• driver drops all packetschannel & wakes app
signature into socket’s
if sleeping in socket code. Socket code
processes packet(s) on wakeup.

(%)
1cpu
2cpu
drvr
sock

Busy

Intr

50
77
58
28

7
9
6
6

LCA06 - Jan 27, 2006 - Jacobson / Felderman

Softint Socket Locks Sched

11
13
12
0

16
24
16
16

8
14
9
1

5
12
9
3

App

1
1
1
1
26
“Channelize” App

• App “registers” transport signature. Gets back an
(mmaped) channel & buffer pool.
matching packets into
• driver drops sleeping. TCP stack in channel &
wakes app if
library
processes packet(s) on wakeup.

(%)
1cpu
2cpu
drvr
sock
App

Busy

Intr

50
77
58
28
14

7
9
6
6
6

LCA06 - Jan 27, 2006 - Jacobson / Felderman

Softint Socket Locks Sched

11
13
12
0
0

16
24
16
16
0

8
14
9
1
0

5
12
9
3
2

App

1
1
1
1
5
27
“Channelize” App

• App “registers” transport signature. Gets back an
(mmaped) channel & buffer pool.
matching packets into
• driver drops sleeping. TCP stack in channel &
wakes app if
library
processes packet(s) on wakeup.

(%)
1cpu
2cpu
drvr
sock
App

Busy

Intr

50
77
58
28
14

7
9
6
6
6

LCA06 - Jan 27, 2006 - Jacobson / Felderman

Softint Socket Locks Sched

11
13
12
0
0

16
24
16
16
0

8
14
9
1
0

5
12
9
3
2

App

1
1
1
1
5
27
“Channelize” App

• App “registers” transport signature. Gets back an
(mmaped) channel & buffer pool.
matching packets into
• driver drops sleeping. TCP stack in channel &
wakes app if
library
processes packet(s) on wakeup.

(%)
1cpu
2cpu
drvr
sock
App

Busy

Intr

50
77
58
28
14

7
9
6
6
6

LCA06 - Jan 27, 2006 - Jacobson / Felderman

Softint Socket Locks Sched

11
13
12
0
0

16
24
16
16
0

8
14
9
1
0

5
12
9
3
2

App

1
1
1
1
5
27
10Gb/s ixgb netpipe tests
NPtcp streaming test between two nodes.

NPtcp ping-pong test between two nodes (one-way latency measured).

(4.3Gb/s throughput limit due to DDR333 memory;
cpus were loafing)
LCA06 - Jan 27, 2006 - Jacobson / Felderman

28
more 10Gb/s
NPtcp ping-pong test between two nodes (one-way latency measured).

LCA06 - Jan 27, 2006 - Jacobson / Felderman

29
more 10Gb/s
LAM MPI: Intel MPI Benchmark (IMB) using 4 boxes (8 processes)
SendRecv bandwidth (bigger is better)

LAM MPI: Intel MPI Benchmark (IMB) using 4 boxes (8 processes)
SendRecv Latency (smaller is better)
LCA06 - Jan 27, 2006 - Jacobson / Felderman

30
more 10Gb/s
LAM MPI: Intel MPI Benchmark (IMB) using 4 boxes (8 processes)
SendRecv Latency (smaller is better)

LCA06 - Jan 27, 2006 - Jacobson / Felderman

31
Conclusion
relatively trivial changes, it’s
• With some finish the good work started by
possible to
NAPI & get rid of almost all the interrupt /
softint processing.

• As a result, everything gets a lot faster.
• Get linear scalability on multi-cpu / multicore systems.
LCA06 - Jan 27, 2006 - Jacobson / Felderman

32
Conclusion (cont.)
• Drivers get simpler (hard_start_xmit &
napi_poll become generic; drivers only
service hardware interrupts).
can
• Anythinglocks,send or receive packets,
without
very cheaply.

• Easy, incremental transition strategy.
LCA06 - Jan 27, 2006 - Jacobson / Felderman

33

Contenu connexe

Tendances

SQL-Server Database.pdf
SQL-Server Database.pdfSQL-Server Database.pdf
SQL-Server Database.pdfShehryarSH1
 
Automating the Entire PostgreSQL Lifecycle
Automating the Entire PostgreSQL Lifecycle Automating the Entire PostgreSQL Lifecycle
Automating the Entire PostgreSQL Lifecycle anynines GmbH
 
From Mainframe to Microservice: An Introduction to Distributed Systems
From Mainframe to Microservice: An Introduction to Distributed SystemsFrom Mainframe to Microservice: An Introduction to Distributed Systems
From Mainframe to Microservice: An Introduction to Distributed SystemsTyler Treat
 
Odi installation guide
Odi installation guideOdi installation guide
Odi installation guideprakashdas05
 
KSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKai Wähner
 
Open HFT libraries in @Java
Open HFT libraries in @JavaOpen HFT libraries in @Java
Open HFT libraries in @JavaPeter Lawrey
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
How IBM's Massive POWER9 UNIX Servers Benefit from InfluxDB and Grafana Techn...
How IBM's Massive POWER9 UNIX Servers Benefit from InfluxDB and Grafana Techn...How IBM's Massive POWER9 UNIX Servers Benefit from InfluxDB and Grafana Techn...
How IBM's Massive POWER9 UNIX Servers Benefit from InfluxDB and Grafana Techn...DevOps.com
 
Splunk conf2014 - Lesser Known Commands in Splunk Search Processing Language ...
Splunk conf2014 - Lesser Known Commands in Splunk Search Processing Language ...Splunk conf2014 - Lesser Known Commands in Splunk Search Processing Language ...
Splunk conf2014 - Lesser Known Commands in Splunk Search Processing Language ...Splunk
 
Less04 database instance
Less04 database instanceLess04 database instance
Less04 database instanceAmit Bhalla
 
Performance tuning a quick intoduction
Performance tuning   a quick intoductionPerformance tuning   a quick intoduction
Performance tuning a quick intoductionRiyaj Shamsudeen
 
Exadata SMART Monitoring - OEM 13c
Exadata SMART Monitoring - OEM 13cExadata SMART Monitoring - OEM 13c
Exadata SMART Monitoring - OEM 13cAlfredo Krieg
 
Oracle Client Failover - Under The Hood
Oracle Client Failover - Under The HoodOracle Client Failover - Under The Hood
Oracle Client Failover - Under The HoodLudovico Caldara
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017 Karan Singh
 
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...
What You Should Know About WebLogic Server 12c (12.2.1.2)  #oow2015 #otntour2...What You Should Know About WebLogic Server 12c (12.2.1.2)  #oow2015 #otntour2...
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...Frank Munz
 
GoldenGate and Stream Processing with Special Guest Rakuten
GoldenGate and Stream Processing with Special Guest RakutenGoldenGate and Stream Processing with Special Guest Rakuten
GoldenGate and Stream Processing with Special Guest RakutenJeffrey T. Pollock
 
Microservices Integration Patterns with Kafka
Microservices Integration Patterns with KafkaMicroservices Integration Patterns with Kafka
Microservices Integration Patterns with KafkaKasun Indrasiri
 

Tendances (20)

Java Performance Tuning
Java Performance TuningJava Performance Tuning
Java Performance Tuning
 
SQL-Server Database.pdf
SQL-Server Database.pdfSQL-Server Database.pdf
SQL-Server Database.pdf
 
Automating the Entire PostgreSQL Lifecycle
Automating the Entire PostgreSQL Lifecycle Automating the Entire PostgreSQL Lifecycle
Automating the Entire PostgreSQL Lifecycle
 
From Mainframe to Microservice: An Introduction to Distributed Systems
From Mainframe to Microservice: An Introduction to Distributed SystemsFrom Mainframe to Microservice: An Introduction to Distributed Systems
From Mainframe to Microservice: An Introduction to Distributed Systems
 
Odi installation guide
Odi installation guideOdi installation guide
Odi installation guide
 
KSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache Kafka
 
Open HFT libraries in @Java
Open HFT libraries in @JavaOpen HFT libraries in @Java
Open HFT libraries in @Java
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
How IBM's Massive POWER9 UNIX Servers Benefit from InfluxDB and Grafana Techn...
How IBM's Massive POWER9 UNIX Servers Benefit from InfluxDB and Grafana Techn...How IBM's Massive POWER9 UNIX Servers Benefit from InfluxDB and Grafana Techn...
How IBM's Massive POWER9 UNIX Servers Benefit from InfluxDB and Grafana Techn...
 
Splunk conf2014 - Lesser Known Commands in Splunk Search Processing Language ...
Splunk conf2014 - Lesser Known Commands in Splunk Search Processing Language ...Splunk conf2014 - Lesser Known Commands in Splunk Search Processing Language ...
Splunk conf2014 - Lesser Known Commands in Splunk Search Processing Language ...
 
Less04 database instance
Less04 database instanceLess04 database instance
Less04 database instance
 
Performance tuning a quick intoduction
Performance tuning   a quick intoductionPerformance tuning   a quick intoduction
Performance tuning a quick intoduction
 
Exadata SMART Monitoring - OEM 13c
Exadata SMART Monitoring - OEM 13cExadata SMART Monitoring - OEM 13c
Exadata SMART Monitoring - OEM 13c
 
Oracle Client Failover - Under The Hood
Oracle Client Failover - Under The HoodOracle Client Failover - Under The Hood
Oracle Client Failover - Under The Hood
 
The C10k Problem
The C10k ProblemThe C10k Problem
The C10k Problem
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017
 
Less03 db dbca
Less03 db dbcaLess03 db dbca
Less03 db dbca
 
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...
What You Should Know About WebLogic Server 12c (12.2.1.2)  #oow2015 #otntour2...What You Should Know About WebLogic Server 12c (12.2.1.2)  #oow2015 #otntour2...
What You Should Know About WebLogic Server 12c (12.2.1.2) #oow2015 #otntour2...
 
GoldenGate and Stream Processing with Special Guest Rakuten
GoldenGate and Stream Processing with Special Guest RakutenGoldenGate and Stream Processing with Special Guest Rakuten
GoldenGate and Stream Processing with Special Guest Rakuten
 
Microservices Integration Patterns with Kafka
Microservices Integration Patterns with KafkaMicroservices Integration Patterns with Kafka
Microservices Integration Patterns with Kafka
 

Similaire à Van jaconson netchannels

DPDK Integration: A Product's Journey - Roger B. Melton
DPDK Integration: A Product's Journey - Roger B. MeltonDPDK Integration: A Product's Journey - Roger B. Melton
DPDK Integration: A Product's Journey - Roger B. Meltonharryvanhaaren
 
FPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow SwitchFPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow SwitchYutaka Yasuda
 
100G Networking Berlin.pdf
100G Networking Berlin.pdf100G Networking Berlin.pdf
100G Networking Berlin.pdfJunZhao68
 
The Data Center and Hadoop
The Data Center and HadoopThe Data Center and Hadoop
The Data Center and HadoopDataWorks Summit
 
Scaling Networks Lab Manual 1st Edition Cisco Solutions Manual
Scaling Networks Lab Manual 1st Edition Cisco Solutions ManualScaling Networks Lab Manual 1st Edition Cisco Solutions Manual
Scaling Networks Lab Manual 1st Edition Cisco Solutions Manualnudicixox
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCoburn Watson
 
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip ArchitecturesISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip ArchitecturesAMD
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHungWei Chiu
 
Porting_uClinux_CELF2008_Griffin
Porting_uClinux_CELF2008_GriffinPorting_uClinux_CELF2008_Griffin
Porting_uClinux_CELF2008_GriffinPeter Griffin
 
Lustre Generational Performance Improvements & New Features
Lustre Generational Performance Improvements & New FeaturesLustre Generational Performance Improvements & New Features
Lustre Generational Performance Improvements & New Featuresinside-BigData.com
 
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at DropboxOptimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at DropboxScyllaDB
 
Ccnp3 lab 3_1_en (hacer)
Ccnp3 lab 3_1_en (hacer)Ccnp3 lab 3_1_en (hacer)
Ccnp3 lab 3_1_en (hacer)Omar Herrera
 
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitchDPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitchJim St. Leger
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
淺談 Live patching technology
淺談 Live patching technology淺談 Live patching technology
淺談 Live patching technologySZ Lin
 
Testing Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with SherlockTesting Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with SherlockScyllaDB
 
Atf 3 q15-2 - product preview
Atf 3 q15-2 - product previewAtf 3 q15-2 - product preview
Atf 3 q15-2 - product previewMason Mei
 

Similaire à Van jaconson netchannels (20)

DPDK Integration: A Product's Journey - Roger B. Melton
DPDK Integration: A Product's Journey - Roger B. MeltonDPDK Integration: A Product's Journey - Roger B. Melton
DPDK Integration: A Product's Journey - Roger B. Melton
 
FPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow SwitchFPGA based 10G Performance Tester for HW OpenFlow Switch
FPGA based 10G Performance Tester for HW OpenFlow Switch
 
D031201021027
D031201021027D031201021027
D031201021027
 
100G Networking Berlin.pdf
100G Networking Berlin.pdf100G Networking Berlin.pdf
100G Networking Berlin.pdf
 
XS Boston 2008 Network Topology
XS Boston 2008 Network TopologyXS Boston 2008 Network Topology
XS Boston 2008 Network Topology
 
The Data Center and Hadoop
The Data Center and HadoopThe Data Center and Hadoop
The Data Center and Hadoop
 
Scaling Networks Lab Manual 1st Edition Cisco Solutions Manual
Scaling Networks Lab Manual 1st Edition Cisco Solutions ManualScaling Networks Lab Manual 1st Edition Cisco Solutions Manual
Scaling Networks Lab Manual 1st Edition Cisco Solutions Manual
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip ArchitecturesISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
Porting_uClinux_CELF2008_Griffin
Porting_uClinux_CELF2008_GriffinPorting_uClinux_CELF2008_Griffin
Porting_uClinux_CELF2008_Griffin
 
10 sdn-vir-6up
10 sdn-vir-6up10 sdn-vir-6up
10 sdn-vir-6up
 
Lustre Generational Performance Improvements & New Features
Lustre Generational Performance Improvements & New FeaturesLustre Generational Performance Improvements & New Features
Lustre Generational Performance Improvements & New Features
 
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at DropboxOptimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
 
Ccnp3 lab 3_1_en (hacer)
Ccnp3 lab 3_1_en (hacer)Ccnp3 lab 3_1_en (hacer)
Ccnp3 lab 3_1_en (hacer)
 
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitchDPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
淺談 Live patching technology
淺談 Live patching technology淺談 Live patching technology
淺談 Live patching technology
 
Testing Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with SherlockTesting Persistent Storage Performance in Kubernetes with Sherlock
Testing Persistent Storage Performance in Kubernetes with Sherlock
 
Atf 3 q15-2 - product preview
Atf 3 q15-2 - product previewAtf 3 q15-2 - product preview
Atf 3 q15-2 - product preview
 

Plus de Susant Sahani

How to debug systemd problems fedora project
How to debug systemd problems   fedora projectHow to debug systemd problems   fedora project
How to debug systemd problems fedora projectSusant Sahani
 
Systemd vs-sys vinit-cheatsheet.jpg
Systemd vs-sys vinit-cheatsheet.jpgSystemd vs-sys vinit-cheatsheet.jpg
Systemd vs-sys vinit-cheatsheet.jpgSusant Sahani
 
Systemd for administrators
Systemd for administratorsSystemd for administrators
Systemd for administratorsSusant Sahani
 
Systemd mlug-20140614
Systemd mlug-20140614Systemd mlug-20140614
Systemd mlug-20140614Susant Sahani
 
Summit demystifying systemd1
Summit demystifying systemd1Summit demystifying systemd1
Summit demystifying systemd1Susant Sahani
 
Systemd evolution revolution_regression
Systemd evolution revolution_regressionSystemd evolution revolution_regression
Systemd evolution revolution_regressionSusant Sahani
 
Systemd for administrators
Systemd for administratorsSystemd for administrators
Systemd for administratorsSusant Sahani
 
Interface between kernel and user space
Interface between kernel and user spaceInterface between kernel and user space
Interface between kernel and user spaceSusant Sahani
 
Synchronization linux
Synchronization linuxSynchronization linux
Synchronization linuxSusant Sahani
 

Plus de Susant Sahani (20)

systemd
systemdsystemd
systemd
 
systemd
systemdsystemd
systemd
 
How to debug systemd problems fedora project
How to debug systemd problems   fedora projectHow to debug systemd problems   fedora project
How to debug systemd problems fedora project
 
Systemd vs-sys vinit-cheatsheet.jpg
Systemd vs-sys vinit-cheatsheet.jpgSystemd vs-sys vinit-cheatsheet.jpg
Systemd vs-sys vinit-cheatsheet.jpg
 
Systemd cheatsheet
Systemd cheatsheetSystemd cheatsheet
Systemd cheatsheet
 
Systemd
SystemdSystemd
Systemd
 
Systemd for administrators
Systemd for administratorsSystemd for administrators
Systemd for administrators
 
Pdf c1t tlawaxb
Pdf c1t tlawaxbPdf c1t tlawaxb
Pdf c1t tlawaxb
 
Systemd mlug-20140614
Systemd mlug-20140614Systemd mlug-20140614
Systemd mlug-20140614
 
Summit demystifying systemd1
Summit demystifying systemd1Summit demystifying systemd1
Summit demystifying systemd1
 
Systemd evolution revolution_regression
Systemd evolution revolution_regressionSystemd evolution revolution_regression
Systemd evolution revolution_regression
 
Systemd for administrators
Systemd for administratorsSystemd for administrators
Systemd for administrators
 
Systemd poettering
Systemd poetteringSystemd poettering
Systemd poettering
 
Interface between kernel and user space
Interface between kernel and user spaceInterface between kernel and user space
Interface between kernel and user space
 
Week3 binary trees
Week3 binary treesWeek3 binary trees
Week3 binary trees
 
Trees
TreesTrees
Trees
 
Synchronization linux
Synchronization linuxSynchronization linux
Synchronization linux
 
Demo preorder-stack
Demo preorder-stackDemo preorder-stack
Demo preorder-stack
 
Bacnet white paper
Bacnet white paperBacnet white paper
Bacnet white paper
 
Api presentation
Api presentationApi presentation
Api presentation
 

Dernier

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 

Dernier (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Van jaconson netchannels

  • 1. Speeding up Networking Van Jacobson van@packetdesign.com Bob Felderman feldy@precisionio.com Precision I/O Linux.conf.au 2006 Dunedin, NZ
  • 2. This talk is not about ‘fixing’ the Linux networking stack The Linux networking stack isn’t broken. care of • The people who takedo goodthe stack know what they’re doing & work. on • Based has all the measurements I’m aware of, of Linux the fastest & most complete stack any OS. LCA06 - Jan 27, 2006 - Jacobson / Felderman 2
  • 3. This talk is about fixing an architectural problem created a long time ago in a place far, far away. . . LCA06 - Jan 27, 2006 - Jacobson / Felderman 3
  • 4. LCA06 - Jan 27, 2006 - Jacobson / Felderman 4
  • 5. In the beginning . . . ARPA created MULTICS • First OS networking stack (MIT, 1970) multi-user ‘super-computer’ • Ran on a @ 0.4 MIPS) (GE-640 than100 • Rarely feweruser. users; took ~2 minutes to page in a • Since ARPAnet performance depended only on how fast host could empty its 6 IMP buffers, had to put stack in kernel. LCA06 - Jan 27, 2006 - Jacobson / Felderman 4
  • 6. The Multics stack begat many other stacks . . . • First TCP/IP stack done on Multics (1980) went to • People from that projectBerkeley BBN to do first TCP/IP stack for Unix (1983). used BBN stack as • Berkeley CSRGfor 4.1c BSD stack (1985). functional spec University of • CSRG wins long battle withstack source California lawyers & makes available under ‘BSD copyright’ (1987). LCA06 - Jan 27, 2006 - Jacobson / Felderman 5
  • 7. Multics architecture, as elaborated by Berkeley, became ‘Standard Model’ Interrupt level Task level System ISR packet LCA06 - Jan 27, 2006 - Jacobson / Felderman Softint skb Application Socket read() byte stream 6
  • 8. “The way we’ve always done it” is not necessarily the same as “the right way to do it” There are a lot of problems associated with this style of implementation . . . LCA06 - Jan 27, 2006 - Jacobson / Felderman 7
  • 9. Protocol Complication • Since data is received by destination kernel, “window” was added to distinguish between “data has arrived” & “data was consumed”. addition more than triples the size • Thisprotocol (window probes, persist of the states) and is responsible for at least half the interoperability issues (Silly Window Syndrome, FIN wars, etc.) LCA06 - Jan 27, 2006 - Jacobson / Felderman 8
  • 10. Internet Stability • You can view a network connection as a servo-loop: S R • A kernel-based protocol implementation converts this to two coupled loops: • S K R A very general theorem (Routh-Hurwitz) says that the two coupled loops will always be less stable then one. also hides the receiving • The kernel loopthe sender which screws app dynamics from up the RTT estimate & causes spurious retransmissions. LCA06 - Jan 27, 2006 - Jacobson / Felderman 9
  • 11. Compromises Even for a simple stream abstraction like TCP, there’s no such thing as a “one size fits all” protocol implementation. • The packetization and send strategies are completely different for bulk data vs. transactions vs. event streams. ack strategies are completely • Thestreaming vs. request response.different for Some of this can be handled with sockopts but some app / kernel implementation mismatch is inevitable. LCA06 - Jan 27, 2006 - Jacobson / Felderman 10
  • 12. Performance (the topic for the rest of this talk) have • Kernel-based implementations oftenuser). extra data copies (packet to skb to • Kernel-based implementations often have extra boundary crossings (hardware interrupt to software interrupt to context switch to syscall return). • Kernel-based implementations often have lock contention and hotspots. LCA06 - Jan 27, 2006 - Jacobson / Felderman 11
  • 13. Why should we care? gotten fast • Networking gear hasenough ($10enough8 (10Gb/s) and cheap for an port Gb switch) that it’s changing from a communications technology to a backplane technology. • The huge mismatch between processor clock rate & memory latency has forced chip makers to put multiple cores on a die. LCA06 - Jan 27, 2006 - Jacobson / Felderman 12
  • 14. Why multiple cores? • Vanilla 2GHz P4 issues 2-4 instr / clock  4-8 instr / ns. • Internal structure of DRAM chip makes cache line fetch take 50-100ns (FSB speed doesn’t matter). did 400 instructions of computing • If you cache line, system would be 50% on every efficient with one core & 100% with two. more like 20 / line • Typical number iswith one core instrcores or 2.5% efficient (20 for 100%). LCA06 - Jan 27, 2006 - Jacobson / Felderman 13
  • 15. Good system performance comes from having lots of cores working independently • This is the canonical Internet problem. is called the “end-to-end • The solutionsays you should push all work principle”. It to the edge & do the absolute minimum inside the net. LCA06 - Jan 27, 2006 - Jacobson / Felderman 14
  • 16. The end of the wire isn’t the end of the net On a uni-processor it doesn’t matter but on a multi-processor the protocol work should be done on the processor that’s going to consume the data. This means ISR & Softint should () do almost nothing and Socket ead r t should do everything. cke So ISR Softint Socket read() S oc ket rea d () LCA06 - Jan 27, 2006 - Jacobson / Felderman 15
  • 17. How good is the stack at spreading out the work? Let’s look at some LCA06 - Jan 27, 2006 - Jacobson / Felderman 16
  • 18. Test setup Dell Poweredge 1750s (2.4GHz P4 Xeon, • Two processor, hyperthreading off) hooked up dual back-to-back via Intel e1000 gig ether cards. 2.6.15 • Running stock(6.3.9). plus current Sourceforge e1000 driver with oprofile • Measurements done runs. Showing 0.9.1. Each 5. test was 5 5-minute median of • booted with idle=poll_idle. Irqbalance off. LCA06 - Jan 27, 2006 - Jacobson / Felderman 17
  • 19. 15 Digression: comparing two profiles ● e1000_intr ● 10 ● e1000_clean ● ● __copy_user_intel 5 schedule __switch_to ● ● tcp_v4_rcv ● 0 device interrupt & app on different cpu _raw_spin_lock ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●●● ●● ● ● ● ● ●● ● ●● ●● ●● ● ● ●● ●●● ● ● ●● ●● ● ● ● ● ●● ● ●● ●● ●● ●● ●● ● 0 ● ● 5 10 15 device interrupt & app on same cpu LCA06 - Jan 27, 2006 - Jacobson / Felderman 18
  • 20. Uni vs. dual processor (netperf) with cpu • 1cpu: run netservercpu as e1000 interrupts. affinity set to same netserver with • 2cpu: runcpu from e1000 cpu affinity set to different interrupts. (%) Busy 1cpu 50 2cpu 77 Intr 7 9 LCA06 - Jan 27, 2006 - Jacobson / Felderman Softint Socket Locks Sched 11 13 16 24 8 14 5 12 App 1 1 19
  • 21. Uni vs. dual processor (netperf) with cpu • 1cpu: run netservercpu as e1000 interrupts. affinity set to same netserver with • 2cpu: runcpu from e1000 cpu affinity set to different interrupts. (%) Busy 1cpu 50 2cpu 77 Intr 7 9 LCA06 - Jan 27, 2006 - Jacobson / Felderman Softint Socket Locks Sched 11 13 16 24 8 14 5 12 App 1 1 19
  • 22. This is just Amdahl’s law in action When adding additional processors: • Benefit (cycles to do work) grows at most linearly. competition, • Cost (contention, grows quadratically. serialization, etc.) • System capacity goes as C(n) = an - bn2 For big enough n, the quadratic always wins. • The key to good scaling is to minimize b. LCA06 - Jan 27, 2006 - Jacobson / Felderman 20
  • 23. Locking destroys performance two ways has multiple writers • The lock(fabulously expensive) so each has to do a RFO cache cycle. update which • The lock requires an atomicthe cache. is implemented by freezing To go fast you want to have a single writer per line and no locks. LCA06 - Jan 27, 2006 - Jacobson / Felderman 21
  • 24. • Networking involves a lot of queues. They’re often implented as doubly linked lists: the child • This isuser poster write for cache thrashing. Every has to every line and every change has to be made multiple places. network • Since most consumercomponents have a producer / relationship, a lock free fifo can work a lot better. LCA06 - Jan 27, 2006 - Jacobson / Felderman 22
  • 25. net_channel - a cache aware, cache friendly queue typedef struct { uint16_t tail; uint8_t wakecnt; uint8_t pad; } net_channel_producer_t; typedef struct { uint16_t head; uint8_t wakecnt; uint8_t wake_type; void* wake_arg; } net_channel_consumer_t; struct { /* next element to add */ /* do wakeup if != consumer wakecnt */ /* /* /* /* next element to remove */ increment to request wakeup */ how to wakeup consumer */ opaque argument to wakeup routine */ net_channel_producer_t p CACHE_ALIGN; uint32_t q[NET_CHANNEL_Q_ENTRIES]; net_channel_consumer_t c; } net_channel_t ; LCA06 - Jan 27, 2006 - Jacobson / Felderman /* producer's header */ /* consumer's header */ 23
  • 26. net_channel (cont.) #define NET_CHANNEL_ENTRIES 512 /* approx number of entries in channel q */ #define NET_CHANNEL_Q_ENTRIES ((ROUND_UP(NET_CHANNEL_ENTRIES*sizeof(uint32_t),CACHE_LINE_SIZE) - sizeof(net_channel_producer_t) - sizeof(net_channel_consumer_t)) / sizeof(uint32_t)) #define CACHE_ALIGN __attribute__((aligned(CACHE_LINE_SIZE))) static inline void net_channel_queue(net_channel_t *chan, uint32_t item) { uint16_t tail = chan->p.tail; uint16_t nxt = (tail + 1) % NET_CHANNEL_Q_ENTRIES; if (nxt != chan->c.head) { chan->q[tail] = item; STORE_BARRIER; chan->p.tail = nxt; if (chan->p.wakecnt != chan->c.wakecnt) { ++chan->p.wakecnt; net_chan_wakeup(chan); } } } LCA06 - Jan 27, 2006 - Jacobson / Felderman 24
  • 27. “Channelize” driver • Remove e1000 driver hard_start_xmit &in napi_poll routines. No softint code left driver & no skb’s (driver deals only in packets). • Send packets to generic_napi_poll via a net channel. (%) Busy 1cpu 50 2cpu 77 drvr 58 Intr 7 9 6 LCA06 - Jan 27, 2006 - Jacobson / Felderman Softint Socket Locks Sched 11 13 12 16 24 16 8 14 9 5 12 9 App 1 1 1 25
  • 28. “Channelize” driver • Remove e1000 driver hard_start_xmit &in napi_poll routines. No softint code left driver & no skb’s (driver deals only in packets). • Send packets to generic_napi_poll via a net channel. (%) Busy 1cpu 50 2cpu 77 drvr 58 Intr 7 9 6 LCA06 - Jan 27, 2006 - Jacobson / Felderman Softint Socket Locks Sched 11 13 12 16 24 16 8 14 9 5 12 9 App 1 1 1 25
  • 29. “Channelize” socket “registers” transport signature with • socket on “accept()”. Gets back a channel. driver with matching • driver drops all packetschannel & wakes app signature into socket’s if sleeping in socket code. Socket code processes packet(s) on wakeup. (%) 1cpu 2cpu drvr sock Busy Intr 50 77 58 28 7 9 6 6 LCA06 - Jan 27, 2006 - Jacobson / Felderman Softint Socket Locks Sched 11 13 12 0 16 24 16 16 8 14 9 1 5 12 9 3 App 1 1 1 1 26
  • 30. “Channelize” socket “registers” transport signature with • socket on “accept()”. Gets back a channel. driver with matching • driver drops all packetschannel & wakes app signature into socket’s if sleeping in socket code. Socket code processes packet(s) on wakeup. (%) 1cpu 2cpu drvr sock Busy Intr 50 77 58 28 7 9 6 6 LCA06 - Jan 27, 2006 - Jacobson / Felderman Softint Socket Locks Sched 11 13 12 0 16 24 16 16 8 14 9 1 5 12 9 3 App 1 1 1 1 26
  • 31. “Channelize” socket “registers” transport signature with • socket on “accept()”. Gets back a channel. driver with matching • driver drops all packetschannel & wakes app signature into socket’s if sleeping in socket code. Socket code processes packet(s) on wakeup. (%) 1cpu 2cpu drvr sock Busy Intr 50 77 58 28 7 9 6 6 LCA06 - Jan 27, 2006 - Jacobson / Felderman Softint Socket Locks Sched 11 13 12 0 16 24 16 16 8 14 9 1 5 12 9 3 App 1 1 1 1 26
  • 32. “Channelize” App • App “registers” transport signature. Gets back an (mmaped) channel & buffer pool. matching packets into • driver drops sleeping. TCP stack in channel & wakes app if library processes packet(s) on wakeup. (%) 1cpu 2cpu drvr sock App Busy Intr 50 77 58 28 14 7 9 6 6 6 LCA06 - Jan 27, 2006 - Jacobson / Felderman Softint Socket Locks Sched 11 13 12 0 0 16 24 16 16 0 8 14 9 1 0 5 12 9 3 2 App 1 1 1 1 5 27
  • 33. “Channelize” App • App “registers” transport signature. Gets back an (mmaped) channel & buffer pool. matching packets into • driver drops sleeping. TCP stack in channel & wakes app if library processes packet(s) on wakeup. (%) 1cpu 2cpu drvr sock App Busy Intr 50 77 58 28 14 7 9 6 6 6 LCA06 - Jan 27, 2006 - Jacobson / Felderman Softint Socket Locks Sched 11 13 12 0 0 16 24 16 16 0 8 14 9 1 0 5 12 9 3 2 App 1 1 1 1 5 27
  • 34. “Channelize” App • App “registers” transport signature. Gets back an (mmaped) channel & buffer pool. matching packets into • driver drops sleeping. TCP stack in channel & wakes app if library processes packet(s) on wakeup. (%) 1cpu 2cpu drvr sock App Busy Intr 50 77 58 28 14 7 9 6 6 6 LCA06 - Jan 27, 2006 - Jacobson / Felderman Softint Socket Locks Sched 11 13 12 0 0 16 24 16 16 0 8 14 9 1 0 5 12 9 3 2 App 1 1 1 1 5 27
  • 35. 10Gb/s ixgb netpipe tests NPtcp streaming test between two nodes. NPtcp ping-pong test between two nodes (one-way latency measured). (4.3Gb/s throughput limit due to DDR333 memory; cpus were loafing) LCA06 - Jan 27, 2006 - Jacobson / Felderman 28
  • 36. more 10Gb/s NPtcp ping-pong test between two nodes (one-way latency measured). LCA06 - Jan 27, 2006 - Jacobson / Felderman 29
  • 37. more 10Gb/s LAM MPI: Intel MPI Benchmark (IMB) using 4 boxes (8 processes) SendRecv bandwidth (bigger is better) LAM MPI: Intel MPI Benchmark (IMB) using 4 boxes (8 processes) SendRecv Latency (smaller is better) LCA06 - Jan 27, 2006 - Jacobson / Felderman 30
  • 38. more 10Gb/s LAM MPI: Intel MPI Benchmark (IMB) using 4 boxes (8 processes) SendRecv Latency (smaller is better) LCA06 - Jan 27, 2006 - Jacobson / Felderman 31
  • 39. Conclusion relatively trivial changes, it’s • With some finish the good work started by possible to NAPI & get rid of almost all the interrupt / softint processing. • As a result, everything gets a lot faster. • Get linear scalability on multi-cpu / multicore systems. LCA06 - Jan 27, 2006 - Jacobson / Felderman 32
  • 40. Conclusion (cont.) • Drivers get simpler (hard_start_xmit & napi_poll become generic; drivers only service hardware interrupts). can • Anythinglocks,send or receive packets, without very cheaply. • Easy, incremental transition strategy. LCA06 - Jan 27, 2006 - Jacobson / Felderman 33