SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
pps Matters
Muhammad Moinur Rahman

moin@bofh.im
What is a switch/router?
• A switch forwards frame based on MAC address

• A router forwards packets based on IP address
What is a Software Switch/Router?
• Software based implementations

• Routers

• BIRD, FRR, Zebra, Quagga, ExaBGP

• Switches

• Open vSwitch

• Mostly installable in a Virtualized Environment or on a *nix environment
What is Hardware Switch/Router?
• Manufactured by big names like Cisco, Juniper, ARISTA, Extreme, Nokia

• Comes with Price Tag

• Sometime comes with really big size

• Has different and multiple ports

• * X 1/10/25/40/50/100/400GB

• So many jargons

• ASIC/Merchant Silicon

• GBPS/TBPS backplane capacity

• GBPS/TBPS forwarding capacity

• k/K/m/M pps forwarding

• line rate forwarding
What is ASIC/merchant Silicon?
• ASIC Miners - Just one example

• Application Specific Integrated Circuits

• Some applications

• Bitcoin Miner

• Voice Recorder

• Cryptographic Accelerator

• Network Switches

• Firewalls

• New Lingo for DC Switches is Silicon

• Off the shelf or Custom Built ASICs

• Broadcom, Cavium are some Silicon Manufacturers

• Broadcom Tomahawk is the flagship ASIC
The BIG Questions
1. If there are open source switch/routers why do we need to buy price
tagged Vendor Devices?

2. Why use Silicon or chips instead of generic X86 processors

3. *nix OS can do anything. Why don’t we install those apps and get rid of
Hardware Vendors?
x86 vs ASIC
• x86

• Jack of all, master of none

• CPU and PCI interrupts

• Limited PCIe bandwidth and based on CPU arch

• ASIC

• Master of one

• No interrupts

• Sky is the limit for PCIe bandwidth
POSIX poses
• POSIX sockets evolved from Berkley Sockets

• BSD Sockets are still the defacto standard since 4.2 BSD Unix

• Adopted from Linux to Windows

• Basic life cycle

• socket(), bind(), listen(), accept(), sendmsg(), recvmsg()

• Network Stacks are implemented in-kernel

• So the functions are using system-call

• Higher overhead for Context Switch and CPU Cache Pollution

• Back-and-forth game in Multi-Core CPU and Multi Queue NIC 

• socket buffers(skb) or network memory buffer(mbuf) stresses OS memory
allocators
Mind the GAP
• Minimal pause required between packets or frames

• Interpacket GAP/Interframe spacing/Interframe GAP

• The standard is 96 bit times

• 9.6 µs for 10 Mbit/s Ethernet

• 0.96 µs for 100 Mbit/s (Fast) Ethernet

• 96 ns for Gigabit Ethernet

• 38.4 ns for 2.5 Gigabit Ethernet

• 19.2 ns for 5 Gigabit Ethernet

• 9.6 ns for 10 Gigabit Ethernet

• 2.4 ns for 40 Gigabit Ethernet

• 0.96 ns for 100 Gigabit Ethernet
run KERNEL run
• KERNEL processing time for 1538 bytes of frame

• at 10Gbit/s == 1230.4 ns between packets (815Kpps)

• at 40Gbit/s == 307.6 ns between packets (3.26Mpps)

• at 100Gbit/s == 123.0 ns between packets (8.15Mpps)

• Smallest frame size of 84 bytes 

• at 10Gbit/s == 67.2 ns between packets (14.88Mpps)

• CPU budget

• 67.2ns => 201 cycles (@3GHz)
OS Limitation
• Most OS are jack of all and master of none

• Desktop, Mail Server, Web Server, DNS Server

• Graphics Rendering, Gaming, Day to Day work

• They are not designed for performance packet processing

• Not optimized for line rate packet processing

• Vyatta, bsdrp are to name a few

• Lots of other commercial os

• That is not the END GAME
kernel bypass
zero-copy
• CPU skips task of copying Data from one memory area to another

• Saves CPU cycles

• Saves memory bandwidth

• OS elements

• Device Driver

• File Systems

• Network Protocol Stack

• zero-copy versions

• Reduces number of mode switching between kernel space and user space
applications

• mostly uses raw sockets with mmap(Memory Map)

• kernel bypass utilizes zero-copy and they arre not the same
RDMA
• Remote Direct Memory Access

• Implemented over high speed, low-latency networks(fabrics)

• Direct access to remote host’s memory

• Dramatically reduces latency and CPU overhead

• Requires specialized hardware specially NIC with support for RDMA

• Bypass remote or local operating system

• Transfers data in between wire and application memory

• Bypasses CPU, cache and context switching

• Transfer continues parallel with OS operations without affecting OS
performance 

• Applications can or cannot be RDMA aware
RDMA(continued)
• Link Layer protocol can be 

• Ethernet

• iWARP(internet Wide Area
RDMA Protocol) combines with
TCP Offload Engine

• NVMe over Fabrics(NVMEoF)

• iSCSI Extensions over
RDMA(iSER)

• SMB Direct

• Sockets Direct Protocol(SDP)

• SCSI RDMA Protocol(SRP)

• NFS over RDMA

• GPUDirect
• Link Layer protocol can be 

• InfiniBand

• Oldest RDMA
implementations

• Main manufacturers were
Intel and Mellanox

• Mostly used in Super
Computing environment

• Ethernet can be run over
InfiniBand

• Omni-Path

• Low Latency Networking
Architecture by Intel
RoCE
• RDMA over Converged Ethernet

• Two versions

• RoCEv1 focuses on Ethernet Link Layer mainly Ethertype 0x8915

• RoCEv2 focuses on Internet Layer mainly UDP/IPv4 and UDP/IPv6

• Routable RoCE is the other lingo of v2 due to it’s routable capability

• Also runs over non-converged Ethernet

• RoCE vs InfiniBand

• RoCE requires lossless Ethernet

• RoCE vs iWARP

• RoCE performs RDMA over Ethernet/UDP whereas iWARP uses TCP

• Some of the vendors are

• Nvidia -> Mellanox

• Broadcom -> Emulex

• Cavium -> QLogic/Marvel Technology
The Cool People of Internet
• Connection Establishment (SYN;SYN-ACK;ACK)

• Acknowledgement of traffic receipt

• Checksum and Sequence

• Sliding Window Calculation

• Congestion Control

• Connection Termination
TOE(TCP Offload Engine)
• Offloads kernel TCP stacks in NIC

• Free up host CPU cycles

• Reduces PCI traffic in between PCI bus and host CPU 

• Types

• Parallel-Stack Full Offload

• Host OS TCP/IP stack and parallel stack with “vampire tap”

• HBA full Offload

• Host Bus Adapter used mainly in iSCSI host adapters

• Besides TCP it also offloads iSCSI functions

• TCP chimney partial Offload

• Mainly a Microsoft lingo; but mostly used alternatively

• Selective TCP stacks are offloaded
tso/lro
• TCP Segmentation Offload

• Big chunks of data are split into multiple packets by NIC before
transmission

• The size depends on MTU of a link in between networking devices

• NIC calculates and splits the data when offloaded from host OS

• Large Receive Offload

• Just the opposite

• Multiple packets of single stream are aggregated into single buffer
before handing over to host OS reducing CPU cycle
chksum
• Although a weak check compared to modern checksum methods but TCP
needs error checking

• Uses one’s complement algorithm

• This is CPU intensive work

• But can be offloaded into NIC if supported

• And it has some disadvantages:

• If used along with packet analyzers; it will report invalid checksums for
packets received

• If used with some virtualization platform which do not have checksum
offload capacity in it’s virtual nic driver
eco systems for fast packet processing
• There are lots of framework

• From open source to commercial

• Sometimes tightly coupled with a vendor

• Specially Network Interface Card vendor

• But there are open standards too

• Some eco systems are vnf friendly or offers application development API
for building new solutions

• Commercial ones are really costly considering the price of NIC
xdp (eXpress Data Path)
• In Linux Kernel since 4.8

• eBPF based high performance Data path

• Similar to AF_PACKET a new address family AF_XDP

• Only supported in Intel and Mellanox cards

• eBPF is offloaded to NIC; in case drivers are unavailable then this is CPU
processed and performs slower

• 26 Mpps per core drop test has been checked successfully with
commodity hardware

• Designed for programmability

• This is not kernel bypass but rather integrated fast-path in kernel

• Works seamlessly with kernel TCP stack
pf_ring
• Available for Linux kernels 2.6.32 and newer

• Loadable kernel module

• 10 Gbit Hardware Packet Filtering using commodity network adapters

• Device driver independent

• Libpcap support for seamless integration with existing pcap-based applications.

• ZC version requires commercial license per mac

• User-space ZC (new generation DNA, Direct NIC Access) drivers for extreme packet capture/transmission speed as
the NIC NPU (Network Process Unit) is pushing/getting packets to/from userland without any kernel intervention.
Using the 10Gbit ZC driver you can send/received at wire-speed at any packet sizes.

• PF_RING ZC library for distributing packets in zero-copy across threads, applications, Virtual Machines.

• Support of Accolade, Exablaze, Endace, Fiberblaze, Inveatech, Mellanox, Myricom/CSPI, Napatech, Netcope and
Intel (ZC) network adapters

• Kernel-based packet capture and sampling

• Ability to specify hundred of header filters in addition to BPF

• Content inspection, so that only packets matching the payload filter are passed

• PF_RING™ plugins for advanced packet parsing and content filtering

• Works pretty well within ntop ecosystem
DPDK(Data Plane Development Kit)
• Set of Data Plane libraries and NIC drivers

• Maintained by Linux Foundation but BSD licensed

• Programming framework for x86, ARM and powerPC

• Environment Abstraction Layer(EAL) is created consisting of a set of
hardware/software environment

• Supports lots of hardware

• AMD, Amazon, Aquantia, Atomic Rules, Broadcom, Cavium, Chelsio,
Cisco, Intel, Marvell, Mellanox, NXP, Netcope, Solarflare

• Extensible to different architecture and systems like Intel IA-32 and
FreeBSD
fd.io (Fast Data Input/Output)
• Run by LFN - The LF(Linux Foundation) Networking Fund

• Cisco has donated VPP(Vector Packet Processing) library to fd.io

• This library has been in production by Cisco since 2003

• Leverages DPDK capabilities

• Aligned to support NFV and SDN

• OPNFV is a sub-project of fd.io
netmap
• A novel framework which utilizes known techniques to reduce packet-
processing costs

• A fast packet I/O mechanism between the NIC and user-space

• Removes unnecessary metadata (e.g. sk_buf) allocation

• Amortized systemcall costs, reduced/removed data copies

• Supported both in FreeBSD and Linux as loadable kernel module

• Comes as default from FreeBSD 11.0

• Released with BSD-2CLAUSE; FreeBSD is the primary development platform

• Supported with Intel, Realtek and Chelsio cards

• 14.8 Mpps achieved in 10G NIC with a 900mhz CPU

• Chelsio has tested 100G traffic in netmap mode with 99.99% success rate
Other ecosystems
• OpenOnload by Solarflare

• Napatech
References
• pf_ring https://www.ntop.org

• DPDK https://www.dpdk.org

• fd.io https://fd.io

• netmap http://info.iet.unipi.it/~luigi/netmap/
Questions
Thank You

Contenu connexe

Tendances

Software Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFVSoftware Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFVYoshihiro Nakajima
 
02 - IDNOG04 - Sheryl Hermoso (APNIC) - IPv6 Deployment at APNIC
02 - IDNOG04 - Sheryl Hermoso (APNIC) - IPv6 Deployment at APNIC02 - IDNOG04 - Sheryl Hermoso (APNIC) - IPv6 Deployment at APNIC
02 - IDNOG04 - Sheryl Hermoso (APNIC) - IPv6 Deployment at APNICIndonesia Network Operators Group
 
Eric Vyncke - Layer-2 security, ipv6 norway
Eric Vyncke - Layer-2 security, ipv6 norwayEric Vyncke - Layer-2 security, ipv6 norway
Eric Vyncke - Layer-2 security, ipv6 norwayIKT-Norge
 
Henrik Strøm - IPv6 from the attacker's perspective
Henrik Strøm - IPv6 from the attacker's perspectiveHenrik Strøm - IPv6 from the attacker's perspective
Henrik Strøm - IPv6 from the attacker's perspectiveIKT-Norge
 
Layer-3 BFD Optimization Proposals for Enterprise and Campus Networks
Layer-3 BFD Optimization Proposals for Enterprise and Campus NetworksLayer-3 BFD Optimization Proposals for Enterprise and Campus Networks
Layer-3 BFD Optimization Proposals for Enterprise and Campus NetworksVikram G Hosakote
 
Обеспечение безопасности сети оператора связи с помощью BGP FlowSpec
Обеспечение безопасности сети оператора связи с помощью BGP FlowSpecОбеспечение безопасности сети оператора связи с помощью BGP FlowSpec
Обеспечение безопасности сети оператора связи с помощью BGP FlowSpecCisco Russia
 
Eric Vyncke - IPv6 security in general
Eric Vyncke - IPv6 security in generalEric Vyncke - IPv6 security in general
Eric Vyncke - IPv6 security in generalIKT-Norge
 
Subnet Pools and Pluggable IPAM
Subnet Pools and Pluggable IPAMSubnet Pools and Pluggable IPAM
Subnet Pools and Pluggable IPAMcarlbaldwin
 
Flowspec @ Bay Area Juniper User Group (BAJUG)
Flowspec @ Bay Area Juniper User Group (BAJUG)Flowspec @ Bay Area Juniper User Group (BAJUG)
Flowspec @ Bay Area Juniper User Group (BAJUG)Juniper Networks
 
Silverlight Wireshark Analysis
Silverlight Wireshark AnalysisSilverlight Wireshark Analysis
Silverlight Wireshark AnalysisYoss Cohen
 
Cloud Traffic Engineer – Google Espresso Project by Shaowen Ma
Cloud Traffic Engineer – Google Espresso Project  by Shaowen MaCloud Traffic Engineer – Google Espresso Project  by Shaowen Ma
Cloud Traffic Engineer – Google Espresso Project by Shaowen MaMyNOG
 
20 - IDNOG03 - Franki Lim (ARISTA) - Overlay Networking with VXLAN
20 - IDNOG03 - Franki Lim (ARISTA) - Overlay Networking with VXLAN20 - IDNOG03 - Franki Lim (ARISTA) - Overlay Networking with VXLAN
20 - IDNOG03 - Franki Lim (ARISTA) - Overlay Networking with VXLANIndonesia Network Operators Group
 

Tendances (20)

Software Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFVSoftware Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFV
 
EVPN Introduction
EVPN IntroductionEVPN Introduction
EVPN Introduction
 
Route Origin Validation - A MANRS Approach
Route Origin Validation - A MANRS ApproachRoute Origin Validation - A MANRS Approach
Route Origin Validation - A MANRS Approach
 
02 - IDNOG04 - Sheryl Hermoso (APNIC) - IPv6 Deployment at APNIC
02 - IDNOG04 - Sheryl Hermoso (APNIC) - IPv6 Deployment at APNIC02 - IDNOG04 - Sheryl Hermoso (APNIC) - IPv6 Deployment at APNIC
02 - IDNOG04 - Sheryl Hermoso (APNIC) - IPv6 Deployment at APNIC
 
Eric Vyncke - Layer-2 security, ipv6 norway
Eric Vyncke - Layer-2 security, ipv6 norwayEric Vyncke - Layer-2 security, ipv6 norway
Eric Vyncke - Layer-2 security, ipv6 norway
 
Henrik Strøm - IPv6 from the attacker's perspective
Henrik Strøm - IPv6 from the attacker's perspectiveHenrik Strøm - IPv6 from the attacker's perspective
Henrik Strøm - IPv6 from the attacker's perspective
 
Layer-3 BFD Optimization Proposals for Enterprise and Campus Networks
Layer-3 BFD Optimization Proposals for Enterprise and Campus NetworksLayer-3 BFD Optimization Proposals for Enterprise and Campus Networks
Layer-3 BFD Optimization Proposals for Enterprise and Campus Networks
 
Haystack + DASH7 Security
Haystack + DASH7 SecurityHaystack + DASH7 Security
Haystack + DASH7 Security
 
Multicast in OpenStack
Multicast in OpenStackMulticast in OpenStack
Multicast in OpenStack
 
Having Honeypot for Better Network Security Analysis
Having Honeypot for Better Network Security AnalysisHaving Honeypot for Better Network Security Analysis
Having Honeypot for Better Network Security Analysis
 
MQTT + DASH7 Integration
MQTT + DASH7 IntegrationMQTT + DASH7 Integration
MQTT + DASH7 Integration
 
Обеспечение безопасности сети оператора связи с помощью BGP FlowSpec
Обеспечение безопасности сети оператора связи с помощью BGP FlowSpecОбеспечение безопасности сети оператора связи с помощью BGP FlowSpec
Обеспечение безопасности сети оператора связи с помощью BGP FlowSpec
 
Eric Vyncke - IPv6 security in general
Eric Vyncke - IPv6 security in generalEric Vyncke - IPv6 security in general
Eric Vyncke - IPv6 security in general
 
Subnet Pools and Pluggable IPAM
Subnet Pools and Pluggable IPAMSubnet Pools and Pluggable IPAM
Subnet Pools and Pluggable IPAM
 
Netflow slides
Netflow slidesNetflow slides
Netflow slides
 
Flowspec @ Bay Area Juniper User Group (BAJUG)
Flowspec @ Bay Area Juniper User Group (BAJUG)Flowspec @ Bay Area Juniper User Group (BAJUG)
Flowspec @ Bay Area Juniper User Group (BAJUG)
 
Stun turn poc_pilot
Stun turn poc_pilotStun turn poc_pilot
Stun turn poc_pilot
 
Silverlight Wireshark Analysis
Silverlight Wireshark AnalysisSilverlight Wireshark Analysis
Silverlight Wireshark Analysis
 
Cloud Traffic Engineer – Google Espresso Project by Shaowen Ma
Cloud Traffic Engineer – Google Espresso Project  by Shaowen MaCloud Traffic Engineer – Google Espresso Project  by Shaowen Ma
Cloud Traffic Engineer – Google Espresso Project by Shaowen Ma
 
20 - IDNOG03 - Franki Lim (ARISTA) - Overlay Networking with VXLAN
20 - IDNOG03 - Franki Lim (ARISTA) - Overlay Networking with VXLAN20 - IDNOG03 - Franki Lim (ARISTA) - Overlay Networking with VXLAN
20 - IDNOG03 - Franki Lim (ARISTA) - Overlay Networking with VXLAN
 

Similaire à pps Matters

High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHungWei Chiu
 
Accelerated dataplanes integration and deployment
Accelerated dataplanes integration and deploymentAccelerated dataplanes integration and deployment
Accelerated dataplanes integration and deploymentOPNFV
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
100G Networking Berlin.pdf
100G Networking Berlin.pdf100G Networking Berlin.pdf
100G Networking Berlin.pdfJunZhao68
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettJim St. Leger
 
Tối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G core
Tối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G coreTối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G core
Tối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G coreVietnam Open Infrastructure User Group
 
Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack eurobsdcon
 
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...DoKC
 
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...DoKC
 
Introduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RIntroduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RSimon Huang
 
Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Colin Charles
 
Fastsocket Linxiaofeng
Fastsocket LinxiaofengFastsocket Linxiaofeng
Fastsocket LinxiaofengMichael Zhang
 
Sharing High-Performance Interconnects Across Multiple Virtual Machines
Sharing High-Performance Interconnects Across Multiple Virtual MachinesSharing High-Performance Interconnects Across Multiple Virtual Machines
Sharing High-Performance Interconnects Across Multiple Virtual Machinesinside-BigData.com
 
Introduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AIIntroduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AITyrone Systems
 
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...LF_DPDK
 
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. GrayOVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. Grayharryvanhaaren
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...PROIDEA
 

Similaire à pps Matters (20)

High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
Accelerated dataplanes integration and deployment
Accelerated dataplanes integration and deploymentAccelerated dataplanes integration and deployment
Accelerated dataplanes integration and deployment
 
To Infiniband and Beyond
To Infiniband and BeyondTo Infiniband and Beyond
To Infiniband and Beyond
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
100G Networking Berlin.pdf
100G Networking Berlin.pdf100G Networking Berlin.pdf
100G Networking Berlin.pdf
 
100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles Shiflett
 
Tối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G core
Tối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G coreTối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G core
Tối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G core
 
Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack
 
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
 
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...
 
Introduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RIntroduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3R
 
Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016
 
Fastsocket Linxiaofeng
Fastsocket LinxiaofengFastsocket Linxiaofeng
Fastsocket Linxiaofeng
 
Sharing High-Performance Interconnects Across Multiple Virtual Machines
Sharing High-Performance Interconnects Across Multiple Virtual MachinesSharing High-Performance Interconnects Across Multiple Virtual Machines
Sharing High-Performance Interconnects Across Multiple Virtual Machines
 
Introduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AIIntroduction to HPC & Supercomputing in AI
Introduction to HPC & Supercomputing in AI
 
Cloud Networking Trends
Cloud Networking TrendsCloud Networking Trends
Cloud Networking Trends
 
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...
 
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. GrayOVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
 

Plus de Bangladesh Network Operators Group

Accelerating Hyper-Converged Enterprise Virtualization using Proxmox and Ceph
Accelerating Hyper-Converged Enterprise Virtualization using Proxmox and CephAccelerating Hyper-Converged Enterprise Virtualization using Proxmox and Ceph
Accelerating Hyper-Converged Enterprise Virtualization using Proxmox and CephBangladesh Network Operators Group
 
Contents Localization Initiatives to get better User Experience
Contents Localization Initiatives to get better User ExperienceContents Localization Initiatives to get better User Experience
Contents Localization Initiatives to get better User ExperienceBangladesh Network Operators Group
 
Re-define network visibility for capacity planning & forecasting with Grafana
Re-define network visibility for capacity planning & forecasting with GrafanaRe-define network visibility for capacity planning & forecasting with Grafana
Re-define network visibility for capacity planning & forecasting with GrafanaBangladesh Network Operators Group
 

Plus de Bangladesh Network Operators Group (20)

Accelerating Hyper-Converged Enterprise Virtualization using Proxmox and Ceph
Accelerating Hyper-Converged Enterprise Virtualization using Proxmox and CephAccelerating Hyper-Converged Enterprise Virtualization using Proxmox and Ceph
Accelerating Hyper-Converged Enterprise Virtualization using Proxmox and Ceph
 
Recent IRR changes by Yoshinobu Matsuzaki, IIJ
Recent IRR changes by Yoshinobu Matsuzaki, IIJRecent IRR changes by Yoshinobu Matsuzaki, IIJ
Recent IRR changes by Yoshinobu Matsuzaki, IIJ
 
Fact Sheets : Network Status in Bangladesh
Fact Sheets : Network Status in BangladeshFact Sheets : Network Status in Bangladesh
Fact Sheets : Network Status in Bangladesh
 
AI Driven Wi-Fi for the Bottom of the Pyramid
AI Driven Wi-Fi for the Bottom of the PyramidAI Driven Wi-Fi for the Bottom of the Pyramid
AI Driven Wi-Fi for the Bottom of the Pyramid
 
IPv6 Security Overview by QS Tahmeed, APNIC RCT
IPv6 Security Overview by QS Tahmeed, APNIC RCTIPv6 Security Overview by QS Tahmeed, APNIC RCT
IPv6 Security Overview by QS Tahmeed, APNIC RCT
 
Network eWaste : Community role to manage end of life Product
Network eWaste : Community role to manage end of life ProductNetwork eWaste : Community role to manage end of life Product
Network eWaste : Community role to manage end of life Product
 
A plenarily integrated SIEM solution and it’s Deployment
A plenarily integrated SIEM solution and it’s DeploymentA plenarily integrated SIEM solution and it’s Deployment
A plenarily integrated SIEM solution and it’s Deployment
 
IPv6 Deployment in South Asia 2022
IPv6 Deployment in South Asia  2022IPv6 Deployment in South Asia  2022
IPv6 Deployment in South Asia 2022
 
Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)
 
RPKI Deployment Status in Bangladesh
RPKI Deployment Status in BangladeshRPKI Deployment Status in Bangladesh
RPKI Deployment Status in Bangladesh
 
An Overview about open UDP Services
An Overview about open UDP ServicesAn Overview about open UDP Services
An Overview about open UDP Services
 
12 Years in DNS Security As a Defender
12 Years in DNS Security As a Defender12 Years in DNS Security As a Defender
12 Years in DNS Security As a Defender
 
Contents Localization Initiatives to get better User Experience
Contents Localization Initiatives to get better User ExperienceContents Localization Initiatives to get better User Experience
Contents Localization Initiatives to get better User Experience
 
BdNOG-20220625-MT-v6.0.pptx
BdNOG-20220625-MT-v6.0.pptxBdNOG-20220625-MT-v6.0.pptx
BdNOG-20220625-MT-v6.0.pptx
 
Route Leak Prevension with BGP Community
Route Leak Prevension with BGP CommunityRoute Leak Prevension with BGP Community
Route Leak Prevension with BGP Community
 
Tale of a New Bangladeshi NIX
Tale of a New Bangladeshi NIXTale of a New Bangladeshi NIX
Tale of a New Bangladeshi NIX
 
MANRS for Network Operators
MANRS for Network OperatorsMANRS for Network Operators
MANRS for Network Operators
 
Re-define network visibility for capacity planning & forecasting with Grafana
Re-define network visibility for capacity planning & forecasting with GrafanaRe-define network visibility for capacity planning & forecasting with Grafana
Re-define network visibility for capacity planning & forecasting with Grafana
 
RPKI ROA updates
RPKI ROA updatesRPKI ROA updates
RPKI ROA updates
 
Blockchain Demystified
Blockchain DemystifiedBlockchain Demystified
Blockchain Demystified
 

Dernier

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 

Dernier (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 

pps Matters

  • 1. pps Matters Muhammad Moinur Rahman moin@bofh.im
  • 2. What is a switch/router? • A switch forwards frame based on MAC address • A router forwards packets based on IP address
  • 3. What is a Software Switch/Router? • Software based implementations • Routers • BIRD, FRR, Zebra, Quagga, ExaBGP • Switches • Open vSwitch • Mostly installable in a Virtualized Environment or on a *nix environment
  • 4. What is Hardware Switch/Router? • Manufactured by big names like Cisco, Juniper, ARISTA, Extreme, Nokia • Comes with Price Tag • Sometime comes with really big size • Has different and multiple ports • * X 1/10/25/40/50/100/400GB • So many jargons • ASIC/Merchant Silicon • GBPS/TBPS backplane capacity • GBPS/TBPS forwarding capacity • k/K/m/M pps forwarding • line rate forwarding
  • 5. What is ASIC/merchant Silicon? • ASIC Miners - Just one example • Application Specific Integrated Circuits • Some applications • Bitcoin Miner • Voice Recorder • Cryptographic Accelerator • Network Switches • Firewalls • New Lingo for DC Switches is Silicon • Off the shelf or Custom Built ASICs • Broadcom, Cavium are some Silicon Manufacturers • Broadcom Tomahawk is the flagship ASIC
  • 6.
  • 7. The BIG Questions 1. If there are open source switch/routers why do we need to buy price tagged Vendor Devices? 2. Why use Silicon or chips instead of generic X86 processors 3. *nix OS can do anything. Why don’t we install those apps and get rid of Hardware Vendors?
  • 8. x86 vs ASIC • x86 • Jack of all, master of none • CPU and PCI interrupts • Limited PCIe bandwidth and based on CPU arch • ASIC • Master of one • No interrupts • Sky is the limit for PCIe bandwidth
  • 9. POSIX poses • POSIX sockets evolved from Berkley Sockets • BSD Sockets are still the defacto standard since 4.2 BSD Unix • Adopted from Linux to Windows • Basic life cycle • socket(), bind(), listen(), accept(), sendmsg(), recvmsg() • Network Stacks are implemented in-kernel • So the functions are using system-call • Higher overhead for Context Switch and CPU Cache Pollution • Back-and-forth game in Multi-Core CPU and Multi Queue NIC • socket buffers(skb) or network memory buffer(mbuf) stresses OS memory allocators
  • 10. Mind the GAP • Minimal pause required between packets or frames • Interpacket GAP/Interframe spacing/Interframe GAP • The standard is 96 bit times • 9.6 µs for 10 Mbit/s Ethernet • 0.96 µs for 100 Mbit/s (Fast) Ethernet • 96 ns for Gigabit Ethernet • 38.4 ns for 2.5 Gigabit Ethernet • 19.2 ns for 5 Gigabit Ethernet • 9.6 ns for 10 Gigabit Ethernet • 2.4 ns for 40 Gigabit Ethernet • 0.96 ns for 100 Gigabit Ethernet
  • 11. run KERNEL run • KERNEL processing time for 1538 bytes of frame • at 10Gbit/s == 1230.4 ns between packets (815Kpps) • at 40Gbit/s == 307.6 ns between packets (3.26Mpps) • at 100Gbit/s == 123.0 ns between packets (8.15Mpps) • Smallest frame size of 84 bytes • at 10Gbit/s == 67.2 ns between packets (14.88Mpps) • CPU budget • 67.2ns => 201 cycles (@3GHz)
  • 12. OS Limitation • Most OS are jack of all and master of none • Desktop, Mail Server, Web Server, DNS Server • Graphics Rendering, Gaming, Day to Day work • They are not designed for performance packet processing • Not optimized for line rate packet processing • Vyatta, bsdrp are to name a few • Lots of other commercial os • That is not the END GAME
  • 14. zero-copy • CPU skips task of copying Data from one memory area to another • Saves CPU cycles • Saves memory bandwidth • OS elements • Device Driver • File Systems • Network Protocol Stack • zero-copy versions • Reduces number of mode switching between kernel space and user space applications • mostly uses raw sockets with mmap(Memory Map) • kernel bypass utilizes zero-copy and they arre not the same
  • 15. RDMA • Remote Direct Memory Access • Implemented over high speed, low-latency networks(fabrics) • Direct access to remote host’s memory • Dramatically reduces latency and CPU overhead • Requires specialized hardware specially NIC with support for RDMA • Bypass remote or local operating system • Transfers data in between wire and application memory • Bypasses CPU, cache and context switching • Transfer continues parallel with OS operations without affecting OS performance • Applications can or cannot be RDMA aware
  • 16. RDMA(continued) • Link Layer protocol can be • Ethernet • iWARP(internet Wide Area RDMA Protocol) combines with TCP Offload Engine • NVMe over Fabrics(NVMEoF) • iSCSI Extensions over RDMA(iSER) • SMB Direct • Sockets Direct Protocol(SDP) • SCSI RDMA Protocol(SRP) • NFS over RDMA • GPUDirect • Link Layer protocol can be • InfiniBand • Oldest RDMA implementations • Main manufacturers were Intel and Mellanox • Mostly used in Super Computing environment • Ethernet can be run over InfiniBand • Omni-Path • Low Latency Networking Architecture by Intel
  • 17. RoCE • RDMA over Converged Ethernet • Two versions • RoCEv1 focuses on Ethernet Link Layer mainly Ethertype 0x8915 • RoCEv2 focuses on Internet Layer mainly UDP/IPv4 and UDP/IPv6 • Routable RoCE is the other lingo of v2 due to it’s routable capability • Also runs over non-converged Ethernet • RoCE vs InfiniBand • RoCE requires lossless Ethernet • RoCE vs iWARP • RoCE performs RDMA over Ethernet/UDP whereas iWARP uses TCP • Some of the vendors are • Nvidia -> Mellanox • Broadcom -> Emulex • Cavium -> QLogic/Marvel Technology
  • 18. The Cool People of Internet • Connection Establishment (SYN;SYN-ACK;ACK) • Acknowledgement of traffic receipt • Checksum and Sequence • Sliding Window Calculation • Congestion Control • Connection Termination
  • 19. TOE(TCP Offload Engine) • Offloads kernel TCP stacks in NIC • Free up host CPU cycles • Reduces PCI traffic in between PCI bus and host CPU • Types • Parallel-Stack Full Offload • Host OS TCP/IP stack and parallel stack with “vampire tap” • HBA full Offload • Host Bus Adapter used mainly in iSCSI host adapters • Besides TCP it also offloads iSCSI functions • TCP chimney partial Offload • Mainly a Microsoft lingo; but mostly used alternatively • Selective TCP stacks are offloaded
  • 20. tso/lro • TCP Segmentation Offload • Big chunks of data are split into multiple packets by NIC before transmission • The size depends on MTU of a link in between networking devices • NIC calculates and splits the data when offloaded from host OS • Large Receive Offload • Just the opposite • Multiple packets of single stream are aggregated into single buffer before handing over to host OS reducing CPU cycle
  • 21. chksum • Although a weak check compared to modern checksum methods but TCP needs error checking • Uses one’s complement algorithm • This is CPU intensive work • But can be offloaded into NIC if supported • And it has some disadvantages: • If used along with packet analyzers; it will report invalid checksums for packets received • If used with some virtualization platform which do not have checksum offload capacity in it’s virtual nic driver
  • 22. eco systems for fast packet processing • There are lots of framework • From open source to commercial • Sometimes tightly coupled with a vendor • Specially Network Interface Card vendor • But there are open standards too • Some eco systems are vnf friendly or offers application development API for building new solutions • Commercial ones are really costly considering the price of NIC
  • 23. xdp (eXpress Data Path) • In Linux Kernel since 4.8 • eBPF based high performance Data path • Similar to AF_PACKET a new address family AF_XDP • Only supported in Intel and Mellanox cards • eBPF is offloaded to NIC; in case drivers are unavailable then this is CPU processed and performs slower • 26 Mpps per core drop test has been checked successfully with commodity hardware • Designed for programmability • This is not kernel bypass but rather integrated fast-path in kernel • Works seamlessly with kernel TCP stack
  • 24. pf_ring • Available for Linux kernels 2.6.32 and newer • Loadable kernel module • 10 Gbit Hardware Packet Filtering using commodity network adapters • Device driver independent • Libpcap support for seamless integration with existing pcap-based applications. • ZC version requires commercial license per mac • User-space ZC (new generation DNA, Direct NIC Access) drivers for extreme packet capture/transmission speed as the NIC NPU (Network Process Unit) is pushing/getting packets to/from userland without any kernel intervention. Using the 10Gbit ZC driver you can send/received at wire-speed at any packet sizes. • PF_RING ZC library for distributing packets in zero-copy across threads, applications, Virtual Machines. • Support of Accolade, Exablaze, Endace, Fiberblaze, Inveatech, Mellanox, Myricom/CSPI, Napatech, Netcope and Intel (ZC) network adapters • Kernel-based packet capture and sampling • Ability to specify hundred of header filters in addition to BPF • Content inspection, so that only packets matching the payload filter are passed • PF_RING™ plugins for advanced packet parsing and content filtering • Works pretty well within ntop ecosystem
  • 25. DPDK(Data Plane Development Kit) • Set of Data Plane libraries and NIC drivers • Maintained by Linux Foundation but BSD licensed • Programming framework for x86, ARM and powerPC • Environment Abstraction Layer(EAL) is created consisting of a set of hardware/software environment • Supports lots of hardware • AMD, Amazon, Aquantia, Atomic Rules, Broadcom, Cavium, Chelsio, Cisco, Intel, Marvell, Mellanox, NXP, Netcope, Solarflare • Extensible to different architecture and systems like Intel IA-32 and FreeBSD
  • 26. fd.io (Fast Data Input/Output) • Run by LFN - The LF(Linux Foundation) Networking Fund • Cisco has donated VPP(Vector Packet Processing) library to fd.io • This library has been in production by Cisco since 2003 • Leverages DPDK capabilities • Aligned to support NFV and SDN • OPNFV is a sub-project of fd.io
  • 27. netmap • A novel framework which utilizes known techniques to reduce packet- processing costs • A fast packet I/O mechanism between the NIC and user-space • Removes unnecessary metadata (e.g. sk_buf) allocation • Amortized systemcall costs, reduced/removed data copies • Supported both in FreeBSD and Linux as loadable kernel module • Comes as default from FreeBSD 11.0 • Released with BSD-2CLAUSE; FreeBSD is the primary development platform • Supported with Intel, Realtek and Chelsio cards • 14.8 Mpps achieved in 10G NIC with a 900mhz CPU • Chelsio has tested 100G traffic in netmap mode with 99.99% success rate
  • 28.
  • 29. Other ecosystems • OpenOnload by Solarflare • Napatech
  • 30. References • pf_ring https://www.ntop.org • DPDK https://www.dpdk.org • fd.io https://fd.io • netmap http://info.iet.unipi.it/~luigi/netmap/