SlideShare une entreprise Scribd logo
1  sur  32
DPDK
Data Plane Development Kit
Ariel Waizel ariel.waizel@gmail.com
2Confidential
Who Am I?
• Programmer for 12 years
– Mostly C/C++ for cyber and networks applications.
– Some higher languages for machine learning and generic DevOps (Python and Java).
• M.sc in Information System Engineering (a.k.a machine learning).
• Currently I’m a solution architect at Contextream ( HPE)
– The company specializes in software defined networks for carrier grade companies (Like Bezek, Partner, etc.).
– Appeal: Everything is open-source!
– I work with costumers on end-to-end solutions and create POCs.
• Gaming for fun.
3Confidential
Let’s Talk About DPDK
• What is DPDK?
• Why use it?
• How good is it?
• How does it work?
4Confidential
The Challenge
• Network application, done in software, supporting native network speeds.
• Network speeds:
– 10 Gb per NIC (Network Interface Card).
– 40 Gb per NIC.
– 100 Gb?
• Network applications:
– Network device implemented in software (Router, switch, http proxy, etc.).
– Network services for virtual machines (Openstack, etc.).
– Security. protect your applications from DDOS.
5Confidential
What is DPDK?
• Set of UIO (User IO) drivers and user space libraries.
• Goal: forward network packet to/from NIC (Network Interface Card) from/to user application at native
speed:
– 10 or 40 Gb NICs.
– Speed is the most (only?) important criteria.
– Only forwards the packets – not a network stack (But there’re helpful libraries and examples to use).
• All traffic bypasses the kernel (We’ll get to why).
– When a NIC is controlled by a DPDK driver, it’s invisible to the kernel.
• Open source (BSD-3 for most, GPL for Linux Kernel related parts).
6Confidential
Why DPDK Should Interest Kernel Devs?
• Bypassing the kernel is importance because of performance - Intriguing by itself.
– At the very least, know the “competition”.
• DPDK is a very light-weight, low level, performance driven framework – Makes it a good learning ground
for kernel developers, to learn performance guidelines.
7Confidential
Why bypassing the kernel is a necessity.
Why Use DPDK?
8Confidential
10Gb – Crunching Numbers
• Minimum Ethernet packet: 64 bytes + 20 preamble.
• Maximum number of pps: 14,880,952
– 10^10 bps /(84 bytes * 8)
• Minimum time to process a single packet: 67.2 ns
• Minimum CPU cycles: 201 cycles on a 3 Ghz CPU (1 Ghz -> 1 cycle per ns).
9Confidential
Time (ish) per Operation (July 2014)
Time (Expected)Operation
32 nsCache Miss
4.3 nsL2 cache access
7.9 nsL3 cache access
8.25 ns (16.5 ns for unlock too)“LOCK” operation (like Atomic)
41.85 ns (75.34 for audit-syscall)syscall
80 ns for alloc + freeSLUB Allocator (Linux Kernel buffer allocator)
77 nsSKB (Linux Kernel packet struct) Memory Manager
48 ns minimum, between 58 to 68 ns averageQdisc (tx queue descriptor)
Up to several Cache Misses.TLB Miss
* Red Hat Challenge 10Gbit/s Jesper Dangaard Brouer et al. Netfilter Workshop July 2014.
Conclusion: Must use batching (Send/rcv several packet together at the same time, amortize costs)
10Confidential
Time Per Operation – The Big No-Nos
Average TimeOperation
Micro seconds (1000 ns)Context Switch
Micro secondsPage Fault
Conclusion: Must pin CPUs and pre-allocate resources
11Confidential
Benchmarks at the end.
As of Today: DPDK can handle 11X the traffic Linux Kernel Can!
12Confidential
How It Works.
DPDK Architecture
13Confidential
DPDK – What do you get?
• UIO drivers
• PMD per hardware NIC:
– PMD (Poll Mode Driver) support for RX/TX (Receive and Transmit).
– Mapping PCI memory and registers.
– Mapping user memory (for example: packet memory buffers) into the NIC.
– Configuration of specific HW accelerations in the NIC.
• User space libraries:
– Initialize PMDs
– Threading (builds on pthread).
– CPU Management
– Memory Management (Huge pages only!).
– Hashing, Scheduler, pipeline (packet steering), etc. – High performance support libraries for the application.
14Confidential
DPDK APP
DPDK APP
From NIC to Process – Pipe Line modelNIC
RX Q
RX Q
PMD
PMD
TX Q
TX Q
DPDK APP
RING
RINGIgb_uio
* igb_uio is the DPDK standard kernel uio driver for device control plane
15Confidential
From NIC to Process – Run To Completion ModelNIC
RX Q
RX Q
TX Q
TX Q
DPDK APP + PMD
DPDK APP + PMDIgb_uio
16Confidential
Software Configuration
• C Code
– GCC 4.5.X or later.
• Required:
– Kernel version >= 2.6.34
– Hugetablefs (For best performance use 1G pages, which require GRUB configuration).
• Recommended:
– isolcpus in GRUB configuration: isolates CPUs from the scheduler.
• Compilation
– DPDK applications are statically compiled/linked with the DPDK libraries, for best performance.
– Export RTE_SDK and RTE_TARGET to develop the application from a misc. directory
• Setup:
– Best to use tools/dpdk-setup.sh to setup/build the environment.
– Use tools/dpdk-devbind.py
o -- status let’s you see the available NICs and their corresponding drivers.
o -bind let’s you bind a NIC to a driver. Igb-uio, for example.
– Run the application with the appropriate arguments.
17Confidential
Software Architecture
• PMD drivers are just user space pthreads that call specific EAL functions
– These EAL functions have concrete implementations per NIC, and this costs couple of indirections.
– Access to RX/TX descriptors is direct.
– Uses UIO driver for specific control changes (like configuring interrupts).
• Most DPDK libraries are not thread-safe.
– PMD drivers are non-preemptive: Can’t have 2 PMDs handle the same HW queue on the same NIC.
• All inner-thread communication is based on librte_ring.
– A mp-mc lockless non-resizable queue ring implementation.
– Optimized for DPDK purposes.
• All resources like memory (malloc), threads, descriptor queues, etc. are initialized at the start.
18Confidential
Software Architecture
19Confidential
Application Bring up
20Confidential
Code Example – Basic Forwarding
• Main Function:
DPDK Init
Get all available NICS (binded with igb_uio)
Initialize packet buffers
Initialize NICs.
21Confidential
Code Example – port_init
NIC init: Set number of queues
Rx queue init: Set packet buffer pool for queue
* Uses librte_ring to be thread safe
Tx queue init: No need for buffer pool
Start Getting Packets
22Confidential
Code Example – lcore_main
PMD
23Confidential
Igb_uio
• For Intel Gigabit NICs.
– Simple enough to work for most NICs, nonetheless.
• Basically:
– Calls pci_enable_device.
– Enables bus mastering on the device (pci_set_master).
– Requests all BARs and mas them using ioremap.
– Setups ioports.
– Sets the dma mask to 64-bit.
• Code to support SRIOV and xen.
24Confidential
rte_ring
• Fixed sized, “lockless”, queue ring
• Non Preemptive.
• Supports multiple/single producer/consumer, and bulk actions.
• Uses:
– Single array of pointers.
– Head/tail pointers for both producer and consumer (total 4 pointers).
• To enqueue (Just like dequeue):
– Until successful:
o Save in local variable the current head_ptr.
o head_next = head_ptr + num_objects
o CAS the head_ptr to head_next
– Insert objects.
– Until successful:
o Update tail_ptr = head_next + 1 when tail_ptr == head_ptr
• Analysis:
– Light weight.
– In theory, both loops are costly.
– In practice, as all threads are cpu bound, the amortized cost is low for the first loop, and very unlikely at the second loop.
25Confidential
rte_mempool
• Smart memory allocation
• Allocate the start of each memory buffer at a different memory channel/rank:
– Most applications only want to see the first 64 bytes of the packet (Ether+ip header).
– Requests for memory at different channels, same rank, are done concurrently by the memory controller.
– Requests for memory at different ranks can be managed effectively by the memory controller.
– Pads objects until gcd(obj_size,num_ranks*num_channels) == 1.
• Maintain a per-core cache, bulk requests to the mempool ring.
• Allocate memory based on:
– NUMA
– Contiguous virtual memory (Means also contiguous physical memory for huge pages).
• Non- preemptive.
– Same lcore must not context switch to another task using the same mempool.
26Confidential
Linux kernel Vs. DPDK
Benchmarks
27Confidential
Linux Kernel Benchmarks, single core
Feb. 2016July 2014Benchmark
14.8 Mpps4 MppsTX
12 Mpps (experimental)6.4 MppsRX (Dump at driver)
2 Mpps1 MppsL3 Forwarding (RX+Filter+TX)
12 Mpps6 MppsL3 forwarding (Multi Core)
* Red Hat The 100Gbit/s Challenge Jesper Dangaard Brouer et al. DevConf Feb. 2016.
28Confidential
DPDK Benchmarks (March 2016)
Multi CoreSingle CoreBenchmark
Linear Increase22 MppsL3 forwarding (PHY-PHY)
Linear Increase11 MppsSwitch Forwarding (PHY-OVS-PHY)
Near Linear Increase3.4 MppsVM Forwarding (PHY-VM-PHY)
Linear Increase2 MppsVM to VM
* Intel Open Network Platform Release 2.1 Performance Test Report March 2016.
• All tests with:
• 4X40Gb ports
• E5-2695 V4 2.1Ghz Processor
• 16X1GB Huge Pages, 2048X2MB Huge Pages
29Confidential
DPDK Benchmarks Figures
PHY-PHY PHY-OVS-PHY VM-VM
30Confidential
DPDK Pros and Cons
31Confidential
DPDK Advantages
• Best forwarding performance solution to/from PHY/process/VM to date.
– Best single core performance.
– Scales: linear performance increase per core.
• Active and longstanding community (from 2012).
– DPDK.org
– Full of tutorials, examples and complementary features.
• Active popular products
– OVS-DPDK is the main solution for high speed networking in openstack.
– 6WIND.
– TRex.
• Great virtualization support.
– Deploy at the host of the guest environment.
32Confidential
DPDK Disadvantages
• Security
• Isolated ecosystem:
– Hard to use linux kernel infrastructure (While there are precedents).
• Requires modified code in applications to use:
– DPDK processes use a very specific API.
– DPDK application can’t interact transparently with Linux processes (important for transparent networking applications
like Firewall , DDOS mitigation, etc.).
o Solved for interaction with VMs by the vhost Library.
• Requires Huge pages (XDP doesn’t).

Contenu connexe

Tendances

Understanding DPDK algorithmics
Understanding DPDK algorithmicsUnderstanding DPDK algorithmics
Understanding DPDK algorithmicsDenys Haryachyy
 
DPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingDPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingMichelle Holley
 
Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsIntel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsHisaki Ohara
 
DPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabDPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabMichelle Holley
 
Xdp and ebpf_maps
Xdp and ebpf_mapsXdp and ebpf_maps
Xdp and ebpf_mapslcplcp1
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)Kirill Tsym
 
eBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KerneleBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KernelThomas Graf
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingKernel TLV
 
Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCKernel TLV
 
Receive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-netReceive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-netYan Vugenfirer
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughThomas Graf
 
Cilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDPCilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDPThomas Graf
 
What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?Michelle Holley
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)Brendan Gregg
 
Debug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpointsDebug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpointsVipin Varghese
 
Ovs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offloadOvs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offloadKevin Traynor
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking ExplainedThomas Graf
 

Tendances (20)

Understanding DPDK algorithmics
Understanding DPDK algorithmicsUnderstanding DPDK algorithmics
Understanding DPDK algorithmics
 
DPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet ProcessingDPDK & Layer 4 Packet Processing
DPDK & Layer 4 Packet Processing
 
Intel DPDK Step by Step instructions
Intel DPDK Step by Step instructionsIntel DPDK Step by Step instructions
Intel DPDK Step by Step instructions
 
DPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabDPDK in Containers Hands-on Lab
DPDK in Containers Hands-on Lab
 
Xdp and ebpf_maps
Xdp and ebpf_mapsXdp and ebpf_maps
Xdp and ebpf_maps
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
 
eBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KerneleBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux Kernel
 
Ixgbe internals
Ixgbe internalsIxgbe internals
Ixgbe internals
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet Processing
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
 
Receive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-netReceive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-net
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking Walkthrough
 
Cilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDPCilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDP
 
What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
 
Debug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpointsDebug dpdk process bottleneck & painpoints
Debug dpdk process bottleneck & painpoints
 
Ovs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offloadOvs dpdk hwoffload way to full offload
Ovs dpdk hwoffload way to full offload
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
 

Similaire à Introduction to DPDK

High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHungWei Chiu
 
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...Jim St. Leger
 
High Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing CommunityHigh Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing Community6WIND
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...PROIDEA
 
Accelerated dataplanes integration and deployment
Accelerated dataplanes integration and deploymentAccelerated dataplanes integration and deployment
Accelerated dataplanes integration and deploymentOPNFV
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Ontico
 
The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)Nicolas Poggi
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuAlan Sill
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
 
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running LinuxLinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linuxbrouer
 
ODSA Use Case - SmartNIC
ODSA Use Case - SmartNICODSA Use Case - SmartNIC
ODSA Use Case - SmartNICODSA Workgroup
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
 
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseTackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseDatabricks
 
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. GrayOVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. Grayharryvanhaaren
 
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-GeneOpenStack Korea Community
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureCeph Community
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitecturePatrick McGarry
 

Similaire à Introduction to DPDK (20)

High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
 
High Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing CommunityHigh Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing Community
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
 
Accelerated dataplanes integration and deployment
Accelerated dataplanes integration and deploymentAccelerated dataplanes integration and deployment
Accelerated dataplanes integration and deployment
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
 
Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
 
The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
 
pps Matters
pps Matterspps Matters
pps Matters
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running LinuxLinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
 
ODSA Use Case - SmartNIC
ODSA Use Case - SmartNICODSA Use Case - SmartNIC
ODSA Use Case - SmartNIC
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseTackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
 
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. GrayOVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
 
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 

Plus de Kernel TLV

SGX Trusted Execution Environment
SGX Trusted Execution EnvironmentSGX Trusted Execution Environment
SGX Trusted Execution EnvironmentKernel TLV
 
Kernel Proc Connector and Containers
Kernel Proc Connector and ContainersKernel Proc Connector and Containers
Kernel Proc Connector and ContainersKernel TLV
 
Bypassing ASLR Exploiting CVE 2015-7545
Bypassing ASLR Exploiting CVE 2015-7545Bypassing ASLR Exploiting CVE 2015-7545
Bypassing ASLR Exploiting CVE 2015-7545Kernel TLV
 
Present Absence of Linux Filesystem Security
Present Absence of Linux Filesystem SecurityPresent Absence of Linux Filesystem Security
Present Absence of Linux Filesystem SecurityKernel TLV
 
OpenWrt From Top to Bottom
OpenWrt From Top to BottomOpenWrt From Top to Bottom
OpenWrt From Top to BottomKernel TLV
 
Make Your Containers Faster: Linux Container Performance Tools
Make Your Containers Faster: Linux Container Performance ToolsMake Your Containers Faster: Linux Container Performance Tools
Make Your Containers Faster: Linux Container Performance ToolsKernel TLV
 
Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...
Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...
Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...Kernel TLV
 
File Systems: Why, How and Where
File Systems: Why, How and WhereFile Systems: Why, How and Where
File Systems: Why, How and WhereKernel TLV
 
netfilter and iptables
netfilter and iptablesnetfilter and iptables
netfilter and iptablesKernel TLV
 
KernelTLV Speaker Guidelines
KernelTLV Speaker GuidelinesKernelTLV Speaker Guidelines
KernelTLV Speaker GuidelinesKernel TLV
 
Userfaultfd: Current Features, Limitations and Future Development
Userfaultfd: Current Features, Limitations and Future DevelopmentUserfaultfd: Current Features, Limitations and Future Development
Userfaultfd: Current Features, Limitations and Future DevelopmentKernel TLV
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageKernel TLV
 
Linux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use CasesLinux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use CasesKernel TLV
 
DMA Survival Guide
DMA Survival GuideDMA Survival Guide
DMA Survival GuideKernel TLV
 
WiFi and the Beast
WiFi and the BeastWiFi and the Beast
WiFi and the BeastKernel TLV
 
FreeBSD and Drivers
FreeBSD and DriversFreeBSD and Drivers
FreeBSD and DriversKernel TLV
 
Specializing the Data Path - Hooking into the Linux Network Stack
Specializing the Data Path - Hooking into the Linux Network StackSpecializing the Data Path - Hooking into the Linux Network Stack
Specializing the Data Path - Hooking into the Linux Network StackKernel TLV
 
Linux Interrupts
Linux InterruptsLinux Interrupts
Linux InterruptsKernel TLV
 
Userfaultfd and Post-Copy Migration
Userfaultfd and Post-Copy MigrationUserfaultfd and Post-Copy Migration
Userfaultfd and Post-Copy MigrationKernel TLV
 

Plus de Kernel TLV (20)

SGX Trusted Execution Environment
SGX Trusted Execution EnvironmentSGX Trusted Execution Environment
SGX Trusted Execution Environment
 
Fun with FUSE
Fun with FUSEFun with FUSE
Fun with FUSE
 
Kernel Proc Connector and Containers
Kernel Proc Connector and ContainersKernel Proc Connector and Containers
Kernel Proc Connector and Containers
 
Bypassing ASLR Exploiting CVE 2015-7545
Bypassing ASLR Exploiting CVE 2015-7545Bypassing ASLR Exploiting CVE 2015-7545
Bypassing ASLR Exploiting CVE 2015-7545
 
Present Absence of Linux Filesystem Security
Present Absence of Linux Filesystem SecurityPresent Absence of Linux Filesystem Security
Present Absence of Linux Filesystem Security
 
OpenWrt From Top to Bottom
OpenWrt From Top to BottomOpenWrt From Top to Bottom
OpenWrt From Top to Bottom
 
Make Your Containers Faster: Linux Container Performance Tools
Make Your Containers Faster: Linux Container Performance ToolsMake Your Containers Faster: Linux Container Performance Tools
Make Your Containers Faster: Linux Container Performance Tools
 
Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...
Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...
Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...
 
File Systems: Why, How and Where
File Systems: Why, How and WhereFile Systems: Why, How and Where
File Systems: Why, How and Where
 
netfilter and iptables
netfilter and iptablesnetfilter and iptables
netfilter and iptables
 
KernelTLV Speaker Guidelines
KernelTLV Speaker GuidelinesKernelTLV Speaker Guidelines
KernelTLV Speaker Guidelines
 
Userfaultfd: Current Features, Limitations and Future Development
Userfaultfd: Current Features, Limitations and Future DevelopmentUserfaultfd: Current Features, Limitations and Future Development
Userfaultfd: Current Features, Limitations and Future Development
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
 
Linux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use CasesLinux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use Cases
 
DMA Survival Guide
DMA Survival GuideDMA Survival Guide
DMA Survival Guide
 
WiFi and the Beast
WiFi and the BeastWiFi and the Beast
WiFi and the Beast
 
FreeBSD and Drivers
FreeBSD and DriversFreeBSD and Drivers
FreeBSD and Drivers
 
Specializing the Data Path - Hooking into the Linux Network Stack
Specializing the Data Path - Hooking into the Linux Network StackSpecializing the Data Path - Hooking into the Linux Network Stack
Specializing the Data Path - Hooking into the Linux Network Stack
 
Linux Interrupts
Linux InterruptsLinux Interrupts
Linux Interrupts
 
Userfaultfd and Post-Copy Migration
Userfaultfd and Post-Copy MigrationUserfaultfd and Post-Copy Migration
Userfaultfd and Post-Copy Migration
 

Dernier

HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 

Dernier (20)

HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 

Introduction to DPDK

  • 1. DPDK Data Plane Development Kit Ariel Waizel ariel.waizel@gmail.com
  • 2. 2Confidential Who Am I? • Programmer for 12 years – Mostly C/C++ for cyber and networks applications. – Some higher languages for machine learning and generic DevOps (Python and Java). • M.sc in Information System Engineering (a.k.a machine learning). • Currently I’m a solution architect at Contextream ( HPE) – The company specializes in software defined networks for carrier grade companies (Like Bezek, Partner, etc.). – Appeal: Everything is open-source! – I work with costumers on end-to-end solutions and create POCs. • Gaming for fun.
  • 3. 3Confidential Let’s Talk About DPDK • What is DPDK? • Why use it? • How good is it? • How does it work?
  • 4. 4Confidential The Challenge • Network application, done in software, supporting native network speeds. • Network speeds: – 10 Gb per NIC (Network Interface Card). – 40 Gb per NIC. – 100 Gb? • Network applications: – Network device implemented in software (Router, switch, http proxy, etc.). – Network services for virtual machines (Openstack, etc.). – Security. protect your applications from DDOS.
  • 5. 5Confidential What is DPDK? • Set of UIO (User IO) drivers and user space libraries. • Goal: forward network packet to/from NIC (Network Interface Card) from/to user application at native speed: – 10 or 40 Gb NICs. – Speed is the most (only?) important criteria. – Only forwards the packets – not a network stack (But there’re helpful libraries and examples to use). • All traffic bypasses the kernel (We’ll get to why). – When a NIC is controlled by a DPDK driver, it’s invisible to the kernel. • Open source (BSD-3 for most, GPL for Linux Kernel related parts).
  • 6. 6Confidential Why DPDK Should Interest Kernel Devs? • Bypassing the kernel is importance because of performance - Intriguing by itself. – At the very least, know the “competition”. • DPDK is a very light-weight, low level, performance driven framework – Makes it a good learning ground for kernel developers, to learn performance guidelines.
  • 7. 7Confidential Why bypassing the kernel is a necessity. Why Use DPDK?
  • 8. 8Confidential 10Gb – Crunching Numbers • Minimum Ethernet packet: 64 bytes + 20 preamble. • Maximum number of pps: 14,880,952 – 10^10 bps /(84 bytes * 8) • Minimum time to process a single packet: 67.2 ns • Minimum CPU cycles: 201 cycles on a 3 Ghz CPU (1 Ghz -> 1 cycle per ns).
  • 9. 9Confidential Time (ish) per Operation (July 2014) Time (Expected)Operation 32 nsCache Miss 4.3 nsL2 cache access 7.9 nsL3 cache access 8.25 ns (16.5 ns for unlock too)“LOCK” operation (like Atomic) 41.85 ns (75.34 for audit-syscall)syscall 80 ns for alloc + freeSLUB Allocator (Linux Kernel buffer allocator) 77 nsSKB (Linux Kernel packet struct) Memory Manager 48 ns minimum, between 58 to 68 ns averageQdisc (tx queue descriptor) Up to several Cache Misses.TLB Miss * Red Hat Challenge 10Gbit/s Jesper Dangaard Brouer et al. Netfilter Workshop July 2014. Conclusion: Must use batching (Send/rcv several packet together at the same time, amortize costs)
  • 10. 10Confidential Time Per Operation – The Big No-Nos Average TimeOperation Micro seconds (1000 ns)Context Switch Micro secondsPage Fault Conclusion: Must pin CPUs and pre-allocate resources
  • 11. 11Confidential Benchmarks at the end. As of Today: DPDK can handle 11X the traffic Linux Kernel Can!
  • 13. 13Confidential DPDK – What do you get? • UIO drivers • PMD per hardware NIC: – PMD (Poll Mode Driver) support for RX/TX (Receive and Transmit). – Mapping PCI memory and registers. – Mapping user memory (for example: packet memory buffers) into the NIC. – Configuration of specific HW accelerations in the NIC. • User space libraries: – Initialize PMDs – Threading (builds on pthread). – CPU Management – Memory Management (Huge pages only!). – Hashing, Scheduler, pipeline (packet steering), etc. – High performance support libraries for the application.
  • 14. 14Confidential DPDK APP DPDK APP From NIC to Process – Pipe Line modelNIC RX Q RX Q PMD PMD TX Q TX Q DPDK APP RING RINGIgb_uio * igb_uio is the DPDK standard kernel uio driver for device control plane
  • 15. 15Confidential From NIC to Process – Run To Completion ModelNIC RX Q RX Q TX Q TX Q DPDK APP + PMD DPDK APP + PMDIgb_uio
  • 16. 16Confidential Software Configuration • C Code – GCC 4.5.X or later. • Required: – Kernel version >= 2.6.34 – Hugetablefs (For best performance use 1G pages, which require GRUB configuration). • Recommended: – isolcpus in GRUB configuration: isolates CPUs from the scheduler. • Compilation – DPDK applications are statically compiled/linked with the DPDK libraries, for best performance. – Export RTE_SDK and RTE_TARGET to develop the application from a misc. directory • Setup: – Best to use tools/dpdk-setup.sh to setup/build the environment. – Use tools/dpdk-devbind.py o -- status let’s you see the available NICs and their corresponding drivers. o -bind let’s you bind a NIC to a driver. Igb-uio, for example. – Run the application with the appropriate arguments.
  • 17. 17Confidential Software Architecture • PMD drivers are just user space pthreads that call specific EAL functions – These EAL functions have concrete implementations per NIC, and this costs couple of indirections. – Access to RX/TX descriptors is direct. – Uses UIO driver for specific control changes (like configuring interrupts). • Most DPDK libraries are not thread-safe. – PMD drivers are non-preemptive: Can’t have 2 PMDs handle the same HW queue on the same NIC. • All inner-thread communication is based on librte_ring. – A mp-mc lockless non-resizable queue ring implementation. – Optimized for DPDK purposes. • All resources like memory (malloc), threads, descriptor queues, etc. are initialized at the start.
  • 20. 20Confidential Code Example – Basic Forwarding • Main Function: DPDK Init Get all available NICS (binded with igb_uio) Initialize packet buffers Initialize NICs.
  • 21. 21Confidential Code Example – port_init NIC init: Set number of queues Rx queue init: Set packet buffer pool for queue * Uses librte_ring to be thread safe Tx queue init: No need for buffer pool Start Getting Packets
  • 23. 23Confidential Igb_uio • For Intel Gigabit NICs. – Simple enough to work for most NICs, nonetheless. • Basically: – Calls pci_enable_device. – Enables bus mastering on the device (pci_set_master). – Requests all BARs and mas them using ioremap. – Setups ioports. – Sets the dma mask to 64-bit. • Code to support SRIOV and xen.
  • 24. 24Confidential rte_ring • Fixed sized, “lockless”, queue ring • Non Preemptive. • Supports multiple/single producer/consumer, and bulk actions. • Uses: – Single array of pointers. – Head/tail pointers for both producer and consumer (total 4 pointers). • To enqueue (Just like dequeue): – Until successful: o Save in local variable the current head_ptr. o head_next = head_ptr + num_objects o CAS the head_ptr to head_next – Insert objects. – Until successful: o Update tail_ptr = head_next + 1 when tail_ptr == head_ptr • Analysis: – Light weight. – In theory, both loops are costly. – In practice, as all threads are cpu bound, the amortized cost is low for the first loop, and very unlikely at the second loop.
  • 25. 25Confidential rte_mempool • Smart memory allocation • Allocate the start of each memory buffer at a different memory channel/rank: – Most applications only want to see the first 64 bytes of the packet (Ether+ip header). – Requests for memory at different channels, same rank, are done concurrently by the memory controller. – Requests for memory at different ranks can be managed effectively by the memory controller. – Pads objects until gcd(obj_size,num_ranks*num_channels) == 1. • Maintain a per-core cache, bulk requests to the mempool ring. • Allocate memory based on: – NUMA – Contiguous virtual memory (Means also contiguous physical memory for huge pages). • Non- preemptive. – Same lcore must not context switch to another task using the same mempool.
  • 27. 27Confidential Linux Kernel Benchmarks, single core Feb. 2016July 2014Benchmark 14.8 Mpps4 MppsTX 12 Mpps (experimental)6.4 MppsRX (Dump at driver) 2 Mpps1 MppsL3 Forwarding (RX+Filter+TX) 12 Mpps6 MppsL3 forwarding (Multi Core) * Red Hat The 100Gbit/s Challenge Jesper Dangaard Brouer et al. DevConf Feb. 2016.
  • 28. 28Confidential DPDK Benchmarks (March 2016) Multi CoreSingle CoreBenchmark Linear Increase22 MppsL3 forwarding (PHY-PHY) Linear Increase11 MppsSwitch Forwarding (PHY-OVS-PHY) Near Linear Increase3.4 MppsVM Forwarding (PHY-VM-PHY) Linear Increase2 MppsVM to VM * Intel Open Network Platform Release 2.1 Performance Test Report March 2016. • All tests with: • 4X40Gb ports • E5-2695 V4 2.1Ghz Processor • 16X1GB Huge Pages, 2048X2MB Huge Pages
  • 31. 31Confidential DPDK Advantages • Best forwarding performance solution to/from PHY/process/VM to date. – Best single core performance. – Scales: linear performance increase per core. • Active and longstanding community (from 2012). – DPDK.org – Full of tutorials, examples and complementary features. • Active popular products – OVS-DPDK is the main solution for high speed networking in openstack. – 6WIND. – TRex. • Great virtualization support. – Deploy at the host of the guest environment.
  • 32. 32Confidential DPDK Disadvantages • Security • Isolated ecosystem: – Hard to use linux kernel infrastructure (While there are precedents). • Requires modified code in applications to use: – DPDK processes use a very specific API. – DPDK application can’t interact transparently with Linux processes (important for transparent networking applications like Firewall , DDOS mitigation, etc.). o Solved for interaction with VMs by the vhost Library. • Requires Huge pages (XDP doesn’t).