SlideShare une entreprise Scribd logo
1  sur  42
Télécharger pour lire hors ligne
SR-IOV ixgbe driver limitations and
improvement
2016-07-15
Hiroshi Shimamoto
3 © NEC Corporation 2015 NEC Group Internal Use Only3 © NEC Corporation 2016
Who am I
Hiroshi Shimamoto
▌Working for NEC
▌Main activities in open source community
Carrier Grade Linux
Virtualization
Packet Processing, DPDK
to Make Linux system usable for Telecom Carriers
Outline
SR-IOV and ixgbe implementation
SR-IOV ixgbe limitations for NFV
Addressing the issues
Future work and possible security issues
5 © NEC Corporation 2015 NEC Group Internal Use Only5 © NEC Corporation 2016
▌What is SR-IOV
One of device virtualization technology
Most of people here may know it
▌In SR-IOV enabled PCI device provides PF (Physical Function)
and VF (Virtual Function)
VF will be used directly in VM
PCI device
SR-IOV
VF
0
PF VF
1
VF
2 ...
VM VM VM
PCI pass through
6 © NEC Corporation 2015 NEC Group Internal Use Only6 © NEC Corporation 2016
Why SR-IOV?
▌I/O in virtual machines is a performance bottleneck
▌Especially packet switching between VM and host
▌Technologies were introduced to address network performance
PCI pass through
SR-IOV
DPDK
7 © NEC Corporation 2015 NEC Group Internal Use Only7 © NEC Corporation 2016
SR-IOV and etc. comparison
▌Brief pros and cons each technology
PCI Pass through SR-IOV DPDK
pros Hardware
performance
Multiple VMs
Hardware
performance
Multiple VMs
Software flexibility
No special driver in
guest
cons Single VM Device support
VF driver in guest
Consume much
CPU power
8 © NEC Corporation 2015 NEC Group Internal Use Only8 © NEC Corporation 2016
How SR-IOV implemented in Intel 82599
▌Target device: Intel 82599 10GbE Controller and ixgbe driver for
PF and ixgbevf driver for VF
▌There are 64 VMDq (Virtual Machine Device queue) in 82599
Calls this queue pool
For SR-IOV, each VF has associated pool
82599 switches a packet to pool (VF)
▌The current ixgbe driver map
VF pool
0 0
1 1
: :
N N
PF N+1
Intel 82599 10GbE controller datasheet URL
http://www.intel.com/content/www/us/en/embedded/products/networking/82599-10-gbe-controller-datasheet.html
9 © NEC Corporation 2015 NEC Group Internal Use Only9 © NEC Corporation 2016
Packet switching in Intel 82599
▌In SR-IOV mode
▌82599 chip switches a packet to corresponding pool
ref. datasheet
7.10.3 Packet Switching
82599 NIC
VF
0
PFVF
1
VF
2
...
Packet Switch
Rx Tx
pools
10 © NEC Corporation 2015 NEC Group Internal Use Only10 © NEC Corporation 2016
Rx packet switching
▌Receive a packet from outside
▌Step
1. Exact unicast or multicast match
2. Broadcast
3. Unicast hash*
4. Multicast hash
5. Multicast promiscuous*
6. VLAN group
7. Default pool
8. Ethertype filters
9. PFVFRE
10.Mirroring*
VF
0
PFVF
1
VF
2
...
Packet Switch
Note*
driver didn’t support these features ref. datasheet
7.10.3.3 Rx Packet Switching
11 © NEC Corporation 2015 NEC Group Internal Use Only11 © NEC Corporation 2016
Tx packet switching
▌Receive a packet from device (PF and/or VF)
▌Step
1. Exact unicast or multicast match
2. Broadcast
3. Unicast hash*
4. Multicast hash
5. Multicast promiscuous*
6. Filer source pool
7. VLAN groups
8. Forwarding to the network
9. PFVFRE
10.Mirroring*
11.PFVFRE
VF
0
PFVF
1
VF
2
...
Packet Switch
Note*
driver didn’t support these features ref. datasheet
7.10.3.4 Tx Packet Switching
12 © NEC Corporation 2015 NEC Group Internal Use Only12 © NEC Corporation 2016
Use Intel 82599 NIC SR-IOV as switch
▌Typical use case
Intel 82599 NIC
PFVF
2 ...
Virtualization
VM0
Guest OS
VF driver
VF
0
NIC
VM1
Guest OS
VF
1
VF driver
NIC
switch
13 © NEC Corporation 2015 NEC Group Internal Use Only13 © NEC Corporation 2016
NFV (Network Function Virtualization)
▌Using virtualization technology, implements Network Functions
on general purpose hardware, to improve flexibility, agility and
efficiency
▌Network Function is virtualized
 VNF (Virtual Network Function) is realized on VM
Virtualization Layer
Orchestration/Resource Management
IMS
VM
VM
EPC
VM
VM
L3SW
VM
VM
GW
VM
VM
Data Base
VM
VM
App Server
VM
VM
IMS EPC L3SW GW
Data
Base
App
Server
Dedicated/Specific Hardware General Purpose Hardware
14 © NEC Corporation 2015 NEC Group Internal Use Only14 © NEC Corporation 2016
SR-IOV ixgbe driver limitations for NFV
▌Using Intel 82599 and SR-IOV, 3 critical limitations for NFV
 VLAN filtering
 Multicast addresses
 Unicast promiscuous
Come from hardware limitation and software (driver) limitation
▌Explain with 2 use cases
Router
Layer 2 switch
Intel 82599 NIC
PFVF
2 ...
Virtualization
VNF
Guest OS
VF driver
VF
0
NIC
VNF
Guest OS
VF
1
VF driver
NIC
switch
NFV use case
VNF is run on VM
15 © NEC Corporation 2015 NEC Group Internal Use Only15 © NEC Corporation 2016
NFV use case 1
▌Network Function: Router
VLAN is used to virtualize the network
Access NetworkUser Equipment Private Network
VNF
VLAN:1
VLAN:2
VLAN:3
VLAN:N
...
Terminates
many VLANs
and IPs
...
16 © NEC Corporation 2015 NEC Group Internal Use Only16 © NEC Corporation 2016
VLAN filtering limitation
▌Intel 82599 has hardware VLAN filter
▌The problem comes from
 VLAN filter has only 64 entries
 Driver always enable VLAN filter if SR-IOV enabled
 Only 64 VLANs can be used with SR-IOV
17 © NEC Corporation 2015 NEC Group Internal Use Only17 © NEC Corporation 2016
Example: VF device returns error against adding VLAN
▌Try to create 64 VLANs on VM
# for i in `seq 1 64`; do
> echo "vlan $i"
> ip link add link ens6 name ens6.$i type vlan id $i
> done
vlan 1
vlan 2
:
vlan 63
vlan 64
RTNETLINK answers: Permission denied
Note
The ixgbe driver use the first entry for VLAN 0,
that means actual the number of VLANs is 63
18 © NEC Corporation 2015 NEC Group Internal Use Only18 © NEC Corporation 2016
Enabling VLAN filter automatically in driver (latest)
▌upstream
static void ixgbe_vlan_promisc_enable(struct ixgbe_adapter *adapter)
{
:
switch (hw->mac.type) {
case ixgbe_mac_82599EB:
case ixgbe_mac_X540:
case ixgbe_mac_X550:
case ixgbe_mac_X550EM_x:
case ixgbe_mac_x550em_a:
default:
if (adapter->flags & IXGBE_FLAG_VMDQ_ENABLED)
break;
/* fall through */
case ixgbe_mac_82598EB:
/* legacy case, we can just disable VLAN filtering */
vlnctrl = IXGBE_READ_REG(hw, IXGBE_VLNCTRL);
vlnctrl &= ~(IXGBE_VLNCTRL_VFE | IXGBE_VLNCTRL_CFIEN);
IXGBE_WRITE_REG(hw, IXGBE_VLNCTRL, vlnctrl);
return;
}
19 © NEC Corporation 2015 NEC Group Internal Use Only19 © NEC Corporation 2016
Enabling VLAN filter automatically in driver (old)
▌previous version
void ixgbe_set_rx_mode(struct net_device *netdev)
{
:
:
vlnctrl &= ~(IXGBE_VLNCTRL_VFE | IXGBE_VLNCTRL_CFIEN);
if (netdev->flags & IFF_PROMISC) {
hw->addr_ctrl.user_set_promisc = true;
fctrl |= (IXGBE_FCTRL_UPE | IXGBE_FCTRL_MPE);
vmolr |= IXGBE_VMOLR_MPE;
/* Only disable hardware filter vlans in promiscuous mode
* if SR-IOV and VMDQ are disabled - otherwise ensure
* that hardware VLAN filters remain enabled.
*/
if (adapter->flags & (IXGBE_FLAG_VMDQ_ENABLED |
IXGBE_FLAG_SRIOV_ENABLED))
vlnctrl |= (IXGBE_VLNCTRL_VFE | IXGBE_VLNCTRL_CFIEN);
20 © NEC Corporation 2015 NEC Group Internal Use Only20 © NEC Corporation 2016
Multicast address limitation
▌The interface (PF-VF mailbox API) limits the number of addresses
▌Only first 30 multicast addresses can be registered
▌Overflowed addresses are silently dropped
 Having Multicast promiscuous is a solution
/* Each entry in the list uses 1 16 bit word. We have 30
* 16 bit words available in our HW msg buffer (minus 1 for the
* msg type). That's 30 hash values if we pack 'em right. If
* there are more than 30 MC addresses to add then punt the
* extras for now and then add code to handle more than 30 later.
* It would be unusual for a server to request that many multi-cast
* addresses except for in large enterprise network environments.
*/
cnt = netdev_mc_count(netdev);
if (cnt > 30)
cnt = 30;
msgbuf[0] = IXGBE_VF_SET_MULTICAST;
msgbuf[0] |= cnt << IXGBE_VT_MSGINFO_SHIFT;
21 © NEC Corporation 2015 NEC Group Internal Use Only21 © NEC Corporation 2016
Why so many multicast addresses?
▌Our case is to support many IPv6 addresses on VF
For Neighbor Discovery, each IPv6 address requires corresponding multicast
address
Unicast address
Multicast MAC
2001:0000 0000:0000 0000:0000 00 12:3456
ff02:0000 0000:0000 0000:0001 ff 12:3456
33:33 ff:12:34:56
Solicited node multicast address
22 © NEC Corporation 2015 NEC Group Internal Use Only22 © NEC Corporation 2016
Example: Cannot receive multicast packet
▌Some multicast packets are not passed to VF, then failed to ND
▌Assign 30 IPv6 addresses on VM (different network address)
▌Ping from other physical machine to VM
# for i in `seq 1 30`; do
> ip -6 addr add 2001:$i::1:$i/64 dev ens6
> done
# for i in `seq 1 30`; do ping6 –w 1 –c 1 –I eth2 2001:$i::1:$i; done
:
PING 2001:28::1:28(2001:28::1:28) from 2001:28::1 eth2: 56 data bytes
--- 2001:28::1:28 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1000ms
23 © NEC Corporation 2015 NEC Group Internal Use Only23 © NEC Corporation 2016
NFV use case 2
▌Network Function: L2 switch
Access NetworkUser Equipment Private Network
VNF
...
MAC:A
MAC:B
Left to right
both of MAC:A and
MAC:B must be able to
receive through this VF
dst:A
dst:B
24 © NEC Corporation 2015 NEC Group Internal Use Only24 © NEC Corporation 2016
Unicast promiscuous
▌Only packets which match the registered MAC can be received
▌VNF should be able to handle every MAC addresses in network
▌Hard to register all MAC addresses in L2 network
 We want Unicast promiscuous feature in VF
25 © NEC Corporation 2015 NEC Group Internal Use Only25 © NEC Corporation 2016
SR-IOV limitations for NFV and current status
▌VLAN filtering
Only 64 VLANs can be used
 Proposed to add an option to disable hardware VLAN filter, but
not accepted
▌Multicast addresses
Only 30 Multicast addresses can be used
 Implemented VF Multicast promiscuous mode in ixgbe/ixgbevf
driver in Linux 4.4
▌Unicast promiscuous
Single unicast MAC address can be used
 Hardware limitation
26 © NEC Corporation 2015 NEC Group Internal Use Only26 © NEC Corporation 2016
Addressing Multicast addresses limitation
▌There is a hardware feature in 82599
VF Multicast promiscuous mode
▌But driver didn’t support that feature
▌Implement new PF-VF mailbox API in ixgbe and ixgbevf
▌First, automatically enable to VF multicast promiscuous when the
number of addresses overs 30
▌Suggested way, implement xcast mode in VF
There is ALLMULTI flag that means that every multicast packet is received in
this device
 Accepted in Linux 4.4
27 © NEC Corporation 2015 NEC Group Internal Use Only27 © NEC Corporation 2016
PF-VF mailbox APIs
▌There are mailbox APIs
Communicate between PF and VF
version API description
legacy RESET VF requests reset
SET_MAC_ADDR VF requests PF to set MAC addr
SET_MULTICAST VF requests PF to set MC addr
SET_VLAN VF requests PF to set VLAN
1.0 SET_LPE VF requests PF to set VMOLR.LPE
SET_MACVLAN VF requests PF for unicast filter
API_NEGOTIATE negotiate API version
1.1 GET_QUEUES get queue configuration
1.2 GET_RETA VF requests for RETA
GET_RSS_KEY get RSS key
UPDATE_XCAST_MODE VF requests PF to set MC mode
28 © NEC Corporation 2015 NEC Group Internal Use Only28 © NEC Corporation 2016
Implementation (PF ixgbe)
▌Handle UPDATE_XCAST_MODE API
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
:
@@ -1066,6 +1122,9 @@ static int ixgbe_rcv_msg_from_vf(struct ixgbe_adapter
*adapter, u32 vf)
case IXGBE_VF_GET_RSS_KEY:
retval = ixgbe_get_vf_rss_key(adapter, msgbuf, vf);
break;
+ case IXGBE_VF_UPDATE_XCAST_MODE:
+ retval = ixgbe_update_vf_xcast_mode(adapter, msgbuf, vf);
+ break;
default:
e_err(drv, "Unhandled Msg %8.8x¥n", msgbuf[0]);
retval = IXGBE_ERR_MBX;
29 © NEC Corporation 2015 NEC Group Internal Use Only29 © NEC Corporation 2016
Implementation (VF ixgbevf)
▌Request to PF (from ixgbevf_set_rx_mode)
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1894,9 +1894,17 @@ static void ixgbevf_set_rx_mode(struct net_device
*netdev)
{
struct ixgbevf_adapter *adapter = netdev_priv(netdev);
struct ixgbe_hw *hw = &adapter->hw;
+ unsigned int flags = netdev->flags;
+ int xcast_mode;
+
+ xcast_mode = (flags & IFF_ALLMULTI) ? IXGBEVF_XCAST_MODE_ALLMULTI :
+ (flags & (IFF_BROADCAST | IFF_MULTICAST)) ?
+ IXGBEVF_XCAST_MODE_MULTI : IXGBEVF_XCAST_MODE_NONE;
spin_lock_bh(&adapter->mbx_lock);
+ hw->mac.ops.update_xcast_mode(hw, netdev, xcast_mode);
+
/* reprogram multicast list */
hw->mac.ops.update_mc_addr_list(hw, netdev);
30 © NEC Corporation 2015 NEC Group Internal Use Only30 © NEC Corporation 2016
Security consideration
▌Enabling VF Multicast promiscuous mode causes security issues
 Can see all multicast packets through this device
 Can hurt performance
NIC duplicates packets and does DMA to each pool
 Make only trusted VF can enable Multicast promiscuous mode
static int ixgbe_update_vf_xcast_mode(struct ixgbe_adapter *adapter,
u32 *msgbuf, u32 vf)
{
:
if (xcast_mode > IXGBEVF_XCAST_MODE_MULTI &&
!adapter->vfinfo[vf].trusted) {
xcast_mode = IXGBEVF_XCAST_MODE_MULTI;
}
31 © NEC Corporation 2015 NEC Group Internal Use Only31 © NEC Corporation 2016
Implement functionality to trust VF
▌Add new operation “set_vf_trust” in net_device_ops
▌Also add support VF trust operation in iproute2 (ip command)
# ip link set dev enp3s0f0 vf 1 trust on
(dmesg)
kernel: ixgbe 0000:03:00.0 enp3s0f0: VF 1 is trusted
kernel: ixgbevf 0000:03:10.2: NIC Link is Down
kernel: ixgbe 0000:03:00.0 enp3s0f0: VF Reset msg received from vf 1
kernel: ixgbevf 0000:03:10.2: NIC Link is Up 10 Gbps
# ip link set dev enp3s0f0 vf 1 trust off
(dmesg)
kernel: ixgbe 0000:03:00.0 enp3s0f0: VF 1 is not trusted
kernel: ixgbevf 0000:03:10.2: NIC Link is Down
kernel: ixgbe 0000:03:00.0 enp3s0f0: VF Reset msg received from vf 1
kernel: ixgbevf 0000:03:10.2: NIC Link is Up 10 Gbps
Note
When trusted state is changed, target vf is reset
32 © NEC Corporation 2015 NEC Group Internal Use Only32 © NEC Corporation 2016
Future work and possible security issues
▌Still, there are limitations
 VLAN filtering
 Unicast promiscuous
▌Possible security issues
 VLAN filter is not isolated
 Multicast hash table is not handled strictly
33 © NEC Corporation 2015 NEC Group Internal Use Only33 © NEC Corporation 2016
VLAN filtering
▌Disabling hardware VLAN filtering may solve this issue
▌But
 It could break existing network feature (DCB, FCoE)
 Broadcast(Multicast) storm could cause performance degradation
 Security issue, BC/MC packets can be seen in every VF
▌Maybe okay if the network and VMs are well managed
▌Another point is that there is no suitable knob to do it now
What command is right to turn VLAN filter off in general
34 © NEC Corporation 2015 NEC Group Internal Use Only34 © NEC Corporation 2016
Unicast promiscuous
▌Supporting this feature in hardware is the best
▌No VF unicast promiscuous feature in 82599 unfortunately
35 © NEC Corporation 2015 NEC Group Internal Use Only35 © NEC Corporation 2016
Hardware support in X540 and X550
▌Later NIC chips, X540 and X550 support VLAN promiscuous and
Unicast promiscuous mode per VF
▌The issues could be solved with X540/X550 chip
▌To make framework for X540/X550 and use the same semantics
for 82599 may be needed
Field Bit(s) Description
Reserved 23:0 21:0 Reserved
UPE 22 Unicast Promiscuous Enable
VPE 23 VLAN Promiscuous Enable
AUPE 24 Accept Untagged Packet Enable
ROMPE 25 Receive Overflow Multicast Packets
ROPE 26 Receive MAC Filters Overflow
BAM 27 Broadcast Accept
MPE 28 Multicast Promiscuous Enable
Reserved 31:29 Reserved
PF VM L2Control Register
new
feature bit
36 © NEC Corporation 2015 NEC Group Internal Use Only36 © NEC Corporation 2016
Ideas
▌Mirroring
▌Unicast hash
Those features are not used/supported in driver
37 © NEC Corporation 2015 NEC Group Internal Use Only37 © NEC Corporation 2016
Possible security issues
▌In the current ixgbe/ixgbevf implementation, there are 2 issues
to be considered
 VLAN filter is not isolated
Single VLAN filter table
No limitation to request new VLAN from VF
If a VM requests 64 VLANs, other VMs never use different VLANs
 Single multicast hash table
Using multicast hash table for switching multicast packet to VF
SET_MULTICAST API is only for setting hash value, no unset functionality
Manipulating IP address assignment can make VF to have multicast
promiscuous behavior
38 © NEC Corporation 2015 NEC Group Internal Use Only38 © NEC Corporation 2016
VLAN filter is not isolated
▌If a VM uses all VLANs, other VM can’t make new VLAN
(VM0) # for i in `seq 1 64`; do
> echo "vlan $i"
> ip link add link ens6 name ens6.$i type vlan id $i
> done
vlan 1
vlan 2
:
vlan 63
vlan 64
RTNETLINK answers: Permission denied
(VM1) # ip link add link ens6 name ens6.64 type vlan id 64
RTNETLINK answers: Permission denied
and if VM does not shutdown gracefully,
registered VLAN filter entry remains
39 © NEC Corporation 2015 NEC Group Internal Use Only39 © NEC Corporation 2016
Single multicast hash table
▌ixgbe driver uses multicast hash table (MTA register)
▌Multicast Table Array is a 4Kb bitmap
i. 82599 make 12 bits hash value from MAC
ii. If corresponding bit in MTA is set, it means hit
iii. Check VF capability PFVFL2FLT.ROMPE and transfer packet
 If every bits in MTA are set, every multicast packet matches
and transferred to pool
▌MTA bit is set by SET_MULTICAST API and no unset API
▌PFVFL2FLT.ROMPE bit always set by receiving SET_MULTICAST
API message
▌Long living host may have unnecessary bits in MTA
40 © NEC Corporation 2015 NEC Group Internal Use Only40 © NEC Corporation 2016
Example: Can receive all multicast packet
▌Invoke SET_MULTICAST from VF each IP address
▌Assign 30 IP addresses again
▌Ping from other physical machine to VM
# for i in `seq 1 30`; do
> ip -6 addr add 2001:$i::1:$i/64 dev ens6
> ip -6 addr del 2001:$i::1:$i/64 dev ens6
> done
# for i in `seq 1 30`; do ping6 –w 1 –c 1 –I eth2 2001:$i::1:$i; done
:
PING 2001:30::1:30(2001:30::1:30) from 2001:30::1 eth2: 56 data bytes
64 bytes from 2001:30::1:30: icmp_seq=1 ttl=64 time=0.947 ms
--- 2001:30::1:30 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.947/0.947/0.947/0.000 ms
# for i in `seq 1 30`; do
> ip -6 addr add 2001:$i::1:$i/64 dev ens6
> done
41 © NEC Corporation 2015 NEC Group Internal Use Only41 © NEC Corporation 2016
Summary
▌SR-IOV in Intel 82599 and ixgbe driver implementation
▌SR-IOV ixgbe driver limitations for NFV
VLAN filtering
Multicast addresses
Unicast promiscuous
▌Addressing Multicast addresses issue
Implement new mailbox API and ndo VF trust
▌Future work and possible security issues
▌Questions?
▌E-Mail: h-shimamoto@ct.jp.nec.com
SR-IOV ixgbe Driver Limitations and Improvement

Contenu connexe

Tendances

Physical Memory Models.pdf
Physical Memory Models.pdfPhysical Memory Models.pdf
Physical Memory Models.pdfAdrian Huang
 
netfilter and iptables
netfilter and iptablesnetfilter and iptables
netfilter and iptablesKernel TLV
 
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...Adrian Huang
 
Kernel_Crash_Dump_Analysis
Kernel_Crash_Dump_AnalysisKernel_Crash_Dump_Analysis
Kernel_Crash_Dump_AnalysisBuland Singh
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux KernelAdrian Huang
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdfAdrian Huang
 
Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernelAdrian Huang
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringGeorg Schönberger
 
U boot porting guide for SoC
U boot porting guide for SoCU boot porting guide for SoC
U boot porting guide for SoCMacpaul Lin
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)Brendan Gregg
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionGene Chang
 
Memory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux KernelMemory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux KernelAdrian Huang
 
eBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KerneleBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KernelThomas Graf
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringScyllaDB
 
Project ACRN hypervisor introduction
Project ACRN hypervisor introduction Project ACRN hypervisor introduction
Project ACRN hypervisor introduction Project ACRN
 

Tendances (20)

Physical Memory Models.pdf
Physical Memory Models.pdfPhysical Memory Models.pdf
Physical Memory Models.pdf
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
netfilter and iptables
netfilter and iptablesnetfilter and iptables
netfilter and iptables
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
 
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
 
Kernel_Crash_Dump_Analysis
Kernel_Crash_Dump_AnalysisKernel_Crash_Dump_Analysis
Kernel_Crash_Dump_Analysis
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux Kernel
 
Namespaces in Linux
Namespaces in LinuxNamespaces in Linux
Namespaces in Linux
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdf
 
Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernel
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and Monitoring
 
Qemu Introduction
Qemu IntroductionQemu Introduction
Qemu Introduction
 
U boot porting guide for SoC
U boot porting guide for SoCU boot porting guide for SoC
U boot porting guide for SoC
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
 
eBPF/XDP
eBPF/XDP eBPF/XDP
eBPF/XDP
 
Memory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux KernelMemory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux Kernel
 
eBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KerneleBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux Kernel
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
 
Project ACRN hypervisor introduction
Project ACRN hypervisor introduction Project ACRN hypervisor introduction
Project ACRN hypervisor introduction
 

En vedette

NVMe Over Fabrics Support in Linux
NVMe Over Fabrics Support in LinuxNVMe Over Fabrics Support in Linux
NVMe Over Fabrics Support in LinuxLF Events
 
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...Odinot Stanislas
 
Userspace Linux I/O
Userspace Linux I/O Userspace Linux I/O
Userspace Linux I/O Garima Kapoor
 
Hardware accelerated virtio networking for nfv linux con
Hardware accelerated virtio networking for nfv linux conHardware accelerated virtio networking for nfv linux con
Hardware accelerated virtio networking for nfv linux consprdd
 
NVMe PCIe and TLC V-NAND It’s about Time
NVMe PCIe and TLC V-NAND It’s about TimeNVMe PCIe and TLC V-NAND It’s about Time
NVMe PCIe and TLC V-NAND It’s about TimeDell World
 
Function Level Analysis of Linux NVMe Driver
Function Level Analysis of Linux NVMe DriverFunction Level Analysis of Linux NVMe Driver
Function Level Analysis of Linux NVMe Driver인구 강
 
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...Red_Hat_Storage
 
Devconf2017 - Can VMs networking benefit from DPDK
Devconf2017 - Can VMs networking benefit from DPDKDevconf2017 - Can VMs networking benefit from DPDK
Devconf2017 - Can VMs networking benefit from DPDKMaxime Coquelin
 
Boost UDP Transaction Performance
Boost UDP Transaction PerformanceBoost UDP Transaction Performance
Boost UDP Transaction PerformanceLF Events
 
Introduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RIntroduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RSimon Huang
 
Moving to PCI Express based SSD with NVM Express
Moving to PCI Express based SSD with NVM ExpressMoving to PCI Express based SSD with NVM Express
Moving to PCI Express based SSD with NVM ExpressOdinot Stanislas
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
 
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...Odinot Stanislas
 
Identifying PCIe 3.0 Dynamic Equalization Problems
Identifying PCIe 3.0 Dynamic Equalization ProblemsIdentifying PCIe 3.0 Dynamic Equalization Problems
Identifying PCIe 3.0 Dynamic Equalization Problemsteledynelecroy
 
Webinar: How NVMe Will Change Flash Storage
Webinar: How NVMe Will Change Flash StorageWebinar: How NVMe Will Change Flash Storage
Webinar: How NVMe Will Change Flash StorageStorage Switzerland
 
PCIe Gen 3.0 Presentation @ 4th FPGA Camp
PCIe Gen 3.0 Presentation @ 4th FPGA CampPCIe Gen 3.0 Presentation @ 4th FPGA Camp
PCIe Gen 3.0 Presentation @ 4th FPGA CampFPGA Central
 
DPDK в виртуальном коммутаторе Open vSwitch / Александр Джуринский (Selectel)
DPDK в виртуальном коммутаторе Open vSwitch / Александр Джуринский (Selectel)DPDK в виртуальном коммутаторе Open vSwitch / Александр Джуринский (Selectel)
DPDK в виртуальном коммутаторе Open vSwitch / Александр Джуринский (Selectel)Ontico
 

En vedette (20)

NVMe Over Fabrics Support in Linux
NVMe Over Fabrics Support in LinuxNVMe Over Fabrics Support in Linux
NVMe Over Fabrics Support in Linux
 
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
 
Userspace Linux I/O
Userspace Linux I/O Userspace Linux I/O
Userspace Linux I/O
 
Hardware accelerated virtio networking for nfv linux con
Hardware accelerated virtio networking for nfv linux conHardware accelerated virtio networking for nfv linux con
Hardware accelerated virtio networking for nfv linux con
 
NVMe PCIe and TLC V-NAND It’s about Time
NVMe PCIe and TLC V-NAND It’s about TimeNVMe PCIe and TLC V-NAND It’s about Time
NVMe PCIe and TLC V-NAND It’s about Time
 
Function Level Analysis of Linux NVMe Driver
Function Level Analysis of Linux NVMe DriverFunction Level Analysis of Linux NVMe Driver
Function Level Analysis of Linux NVMe Driver
 
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
 
Devconf2017 - Can VMs networking benefit from DPDK
Devconf2017 - Can VMs networking benefit from DPDKDevconf2017 - Can VMs networking benefit from DPDK
Devconf2017 - Can VMs networking benefit from DPDK
 
Boost UDP Transaction Performance
Boost UDP Transaction PerformanceBoost UDP Transaction Performance
Boost UDP Transaction Performance
 
Introduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3RIntroduction to NVMe Over Fabrics-V3R
Introduction to NVMe Over Fabrics-V3R
 
Moving to PCI Express based SSD with NVM Express
Moving to PCI Express based SSD with NVM ExpressMoving to PCI Express based SSD with NVM Express
Moving to PCI Express based SSD with NVM Express
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
HP_NextGEN_Training_Q4_2015
HP_NextGEN_Training_Q4_2015HP_NextGEN_Training_Q4_2015
HP_NextGEN_Training_Q4_2015
 
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
 
Identifying PCIe 3.0 Dynamic Equalization Problems
Identifying PCIe 3.0 Dynamic Equalization ProblemsIdentifying PCIe 3.0 Dynamic Equalization Problems
Identifying PCIe 3.0 Dynamic Equalization Problems
 
PCIe
PCIePCIe
PCIe
 
Webinar: How NVMe Will Change Flash Storage
Webinar: How NVMe Will Change Flash StorageWebinar: How NVMe Will Change Flash Storage
Webinar: How NVMe Will Change Flash Storage
 
Pci express modi
Pci express modiPci express modi
Pci express modi
 
PCIe Gen 3.0 Presentation @ 4th FPGA Camp
PCIe Gen 3.0 Presentation @ 4th FPGA CampPCIe Gen 3.0 Presentation @ 4th FPGA Camp
PCIe Gen 3.0 Presentation @ 4th FPGA Camp
 
DPDK в виртуальном коммутаторе Open vSwitch / Александр Джуринский (Selectel)
DPDK в виртуальном коммутаторе Open vSwitch / Александр Джуринский (Selectel)DPDK в виртуальном коммутаторе Open vSwitch / Александр Джуринский (Selectel)
DPDK в виртуальном коммутаторе Open vSwitch / Александр Джуринский (Selectel)
 

Similaire à SR-IOV ixgbe Driver Limitations and Improvement

SR-IOV, KVM and Emulex OneConnect 10Gbps cards on Debian/Stable
SR-IOV, KVM and Emulex OneConnect 10Gbps cards on Debian/StableSR-IOV, KVM and Emulex OneConnect 10Gbps cards on Debian/Stable
SR-IOV, KVM and Emulex OneConnect 10Gbps cards on Debian/Stablejuet-y
 
SR-IOV, KVM and Intel X520 10Gbps cards on Debian/Stable
SR-IOV, KVM and Intel X520 10Gbps cards on Debian/StableSR-IOV, KVM and Intel X520 10Gbps cards on Debian/Stable
SR-IOV, KVM and Intel X520 10Gbps cards on Debian/Stablejuet-y
 
82599 sriov vm configuration notes
82599 sriov vm configuration notes82599 sriov vm configuration notes
82599 sriov vm configuration notesRyan Aydelott
 
20141102 VyOS 1.1.0 and NIFTY Cloud New Features
20141102 VyOS 1.1.0 and NIFTY Cloud New Features20141102 VyOS 1.1.0 and NIFTY Cloud New Features
20141102 VyOS 1.1.0 and NIFTY Cloud New Features雄也 日下部
 
VXLAN: Enhancements and Network Integration
VXLAN: Enhancements and Network Integration VXLAN: Enhancements and Network Integration
VXLAN: Enhancements and Network Integration Eddie Parra
 
Virtualizing the Network to enable a Software Defined Infrastructure (SDI)
Virtualizing the Network to enable a Software Defined Infrastructure (SDI)Virtualizing the Network to enable a Software Defined Infrastructure (SDI)
Virtualizing the Network to enable a Software Defined Infrastructure (SDI)Odinot Stanislas
 
Power vc for powervm deep dive tips &amp; tricks
Power vc for powervm deep dive tips &amp; tricksPower vc for powervm deep dive tips &amp; tricks
Power vc for powervm deep dive tips &amp; trickssolarisyougood
 
2014/09/02 Cisco UCS HPC @ ANL
2014/09/02 Cisco UCS HPC @ ANL2014/09/02 Cisco UCS HPC @ ANL
2014/09/02 Cisco UCS HPC @ ANLdgoodell
 
如何用k8s打造國產5G NFV平臺? 剖析經濟部5G核網技術的關鍵
如何用k8s打造國產5G NFV平臺?剖析經濟部5G核網技術的關鍵如何用k8s打造國產5G NFV平臺?剖析經濟部5G核網技術的關鍵
如何用k8s打造國產5G NFV平臺? 剖析經濟部5G核網技術的關鍵Jace Liang
 
Approaching hyperconvergedopenstack
Approaching hyperconvergedopenstackApproaching hyperconvergedopenstack
Approaching hyperconvergedopenstackIkuo Kumagai
 
Netsft2017 day in_life_of_nfv
Netsft2017 day in_life_of_nfvNetsft2017 day in_life_of_nfv
Netsft2017 day in_life_of_nfvIntel
 
The Next Step of OpenStack Evolution for NFV Deployments
The Next Step ofOpenStack Evolution for NFV DeploymentsThe Next Step ofOpenStack Evolution for NFV Deployments
The Next Step of OpenStack Evolution for NFV DeploymentsDirk Kutscher
 
Pushing Packets - How do the ML2 Mechanism Drivers Stack Up
Pushing Packets - How do the ML2 Mechanism Drivers Stack UpPushing Packets - How do the ML2 Mechanism Drivers Stack Up
Pushing Packets - How do the ML2 Mechanism Drivers Stack UpJames Denton
 
SR-IOV+KVM on Debian/Stable
SR-IOV+KVM on Debian/StableSR-IOV+KVM on Debian/Stable
SR-IOV+KVM on Debian/Stablejuet-y
 
Openstack v4 0
Openstack v4 0Openstack v4 0
Openstack v4 0sprdd
 
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineKVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineShapeBlue
 
PLNOG 5: Piotr Szołkowski - Data Center i nie tylko...
PLNOG 5: Piotr Szołkowski - Data Center i nie tylko...PLNOG 5: Piotr Szołkowski - Data Center i nie tylko...
PLNOG 5: Piotr Szołkowski - Data Center i nie tylko...PROIDEA
 

Similaire à SR-IOV ixgbe Driver Limitations and Improvement (20)

10.) vxlan
10.) vxlan10.) vxlan
10.) vxlan
 
SR-IOV, KVM and Emulex OneConnect 10Gbps cards on Debian/Stable
SR-IOV, KVM and Emulex OneConnect 10Gbps cards on Debian/StableSR-IOV, KVM and Emulex OneConnect 10Gbps cards on Debian/Stable
SR-IOV, KVM and Emulex OneConnect 10Gbps cards on Debian/Stable
 
Icnd210 s02l01
Icnd210 s02l01Icnd210 s02l01
Icnd210 s02l01
 
SR-IOV, KVM and Intel X520 10Gbps cards on Debian/Stable
SR-IOV, KVM and Intel X520 10Gbps cards on Debian/StableSR-IOV, KVM and Intel X520 10Gbps cards on Debian/Stable
SR-IOV, KVM and Intel X520 10Gbps cards on Debian/Stable
 
82599 sriov vm configuration notes
82599 sriov vm configuration notes82599 sriov vm configuration notes
82599 sriov vm configuration notes
 
20141102 VyOS 1.1.0 and NIFTY Cloud New Features
20141102 VyOS 1.1.0 and NIFTY Cloud New Features20141102 VyOS 1.1.0 and NIFTY Cloud New Features
20141102 VyOS 1.1.0 and NIFTY Cloud New Features
 
VXLAN: Enhancements and Network Integration
VXLAN: Enhancements and Network Integration VXLAN: Enhancements and Network Integration
VXLAN: Enhancements and Network Integration
 
Virtualizing the Network to enable a Software Defined Infrastructure (SDI)
Virtualizing the Network to enable a Software Defined Infrastructure (SDI)Virtualizing the Network to enable a Software Defined Infrastructure (SDI)
Virtualizing the Network to enable a Software Defined Infrastructure (SDI)
 
Power vc for powervm deep dive tips &amp; tricks
Power vc for powervm deep dive tips &amp; tricksPower vc for powervm deep dive tips &amp; tricks
Power vc for powervm deep dive tips &amp; tricks
 
2014/09/02 Cisco UCS HPC @ ANL
2014/09/02 Cisco UCS HPC @ ANL2014/09/02 Cisco UCS HPC @ ANL
2014/09/02 Cisco UCS HPC @ ANL
 
如何用k8s打造國產5G NFV平臺? 剖析經濟部5G核網技術的關鍵
如何用k8s打造國產5G NFV平臺?剖析經濟部5G核網技術的關鍵如何用k8s打造國產5G NFV平臺?剖析經濟部5G核網技術的關鍵
如何用k8s打造國產5G NFV平臺? 剖析經濟部5G核網技術的關鍵
 
Approaching hyperconvergedopenstack
Approaching hyperconvergedopenstackApproaching hyperconvergedopenstack
Approaching hyperconvergedopenstack
 
Netsft2017 day in_life_of_nfv
Netsft2017 day in_life_of_nfvNetsft2017 day in_life_of_nfv
Netsft2017 day in_life_of_nfv
 
The Next Step of OpenStack Evolution for NFV Deployments
The Next Step ofOpenStack Evolution for NFV DeploymentsThe Next Step ofOpenStack Evolution for NFV Deployments
The Next Step of OpenStack Evolution for NFV Deployments
 
Pushing Packets - How do the ML2 Mechanism Drivers Stack Up
Pushing Packets - How do the ML2 Mechanism Drivers Stack UpPushing Packets - How do the ML2 Mechanism Drivers Stack Up
Pushing Packets - How do the ML2 Mechanism Drivers Stack Up
 
SR-IOV+KVM on Debian/Stable
SR-IOV+KVM on Debian/StableSR-IOV+KVM on Debian/Stable
SR-IOV+KVM on Debian/Stable
 
Ccnav5.org ccna 3-v50_final_exam_2014
Ccnav5.org ccna 3-v50_final_exam_2014Ccnav5.org ccna 3-v50_final_exam_2014
Ccnav5.org ccna 3-v50_final_exam_2014
 
Openstack v4 0
Openstack v4 0Openstack v4 0
Openstack v4 0
 
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineKVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
 
PLNOG 5: Piotr Szołkowski - Data Center i nie tylko...
PLNOG 5: Piotr Szołkowski - Data Center i nie tylko...PLNOG 5: Piotr Szołkowski - Data Center i nie tylko...
PLNOG 5: Piotr Szołkowski - Data Center i nie tylko...
 

Plus de LF Events

Feature rich BTRFS is Getting Richer with Encryption
Feature rich BTRFS is Getting Richer with EncryptionFeature rich BTRFS is Getting Richer with Encryption
Feature rich BTRFS is Getting Richer with EncryptionLF Events
 
KASan in a Bare-Metal Hypervisor
 KASan in a Bare-Metal Hypervisor  KASan in a Bare-Metal Hypervisor
KASan in a Bare-Metal Hypervisor LF Events
 
Efficient kernel backporting
Efficient kernel backportingEfficient kernel backporting
Efficient kernel backportingLF Events
 
Raspberry pi Update - Encourage your IOT
Raspberry pi Update - Encourage your IOTRaspberry pi Update - Encourage your IOT
Raspberry pi Update - Encourage your IOTLF Events
 
Introduction to Open-O
Introduction to Open-OIntroduction to Open-O
Introduction to Open-OLF Events
 
CNCF and Fujitsu
CNCF and FujitsuCNCF and Fujitsu
CNCF and FujitsuLF Events
 
Linxu conj2016 96boards
Linxu conj2016 96boardsLinxu conj2016 96boards
Linxu conj2016 96boardsLF Events
 
Taking over to the Next Generation
Taking over to the Next GenerationTaking over to the Next Generation
Taking over to the Next GenerationLF Events
 
Learning From Real Practice of Providing Highly Available Hybrid Cloud Servic...
Learning From Real Practice of Providing Highly Available Hybrid Cloud Servic...Learning From Real Practice of Providing Highly Available Hybrid Cloud Servic...
Learning From Real Practice of Providing Highly Available Hybrid Cloud Servic...LF Events
 
Generating a Reproducible and Maintainable Embedded Linux Environment with Po...
Generating a Reproducible and Maintainable Embedded Linux Environment with Po...Generating a Reproducible and Maintainable Embedded Linux Environment with Po...
Generating a Reproducible and Maintainable Embedded Linux Environment with Po...LF Events
 
Secure IOT Gateway
Secure IOT GatewaySecure IOT Gateway
Secure IOT GatewayLF Events
 
Trading Derivatives on Hyperledger
Trading Derivatives on HyperledgerTrading Derivatives on Hyperledger
Trading Derivatives on HyperledgerLF Events
 
Introducing Oracle Linux and Securing It With ksplice
Introducing Oracle Linux and Securing It With kspliceIntroducing Oracle Linux and Securing It With ksplice
Introducing Oracle Linux and Securing It With kspliceLF Events
 
Containers: Don't Skeu Them Up, Use Microservices Instead
Containers: Don't Skeu Them Up, Use Microservices InsteadContainers: Don't Skeu Them Up, Use Microservices Instead
Containers: Don't Skeu Them Up, Use Microservices InsteadLF Events
 

Plus de LF Events (14)

Feature rich BTRFS is Getting Richer with Encryption
Feature rich BTRFS is Getting Richer with EncryptionFeature rich BTRFS is Getting Richer with Encryption
Feature rich BTRFS is Getting Richer with Encryption
 
KASan in a Bare-Metal Hypervisor
 KASan in a Bare-Metal Hypervisor  KASan in a Bare-Metal Hypervisor
KASan in a Bare-Metal Hypervisor
 
Efficient kernel backporting
Efficient kernel backportingEfficient kernel backporting
Efficient kernel backporting
 
Raspberry pi Update - Encourage your IOT
Raspberry pi Update - Encourage your IOTRaspberry pi Update - Encourage your IOT
Raspberry pi Update - Encourage your IOT
 
Introduction to Open-O
Introduction to Open-OIntroduction to Open-O
Introduction to Open-O
 
CNCF and Fujitsu
CNCF and FujitsuCNCF and Fujitsu
CNCF and Fujitsu
 
Linxu conj2016 96boards
Linxu conj2016 96boardsLinxu conj2016 96boards
Linxu conj2016 96boards
 
Taking over to the Next Generation
Taking over to the Next GenerationTaking over to the Next Generation
Taking over to the Next Generation
 
Learning From Real Practice of Providing Highly Available Hybrid Cloud Servic...
Learning From Real Practice of Providing Highly Available Hybrid Cloud Servic...Learning From Real Practice of Providing Highly Available Hybrid Cloud Servic...
Learning From Real Practice of Providing Highly Available Hybrid Cloud Servic...
 
Generating a Reproducible and Maintainable Embedded Linux Environment with Po...
Generating a Reproducible and Maintainable Embedded Linux Environment with Po...Generating a Reproducible and Maintainable Embedded Linux Environment with Po...
Generating a Reproducible and Maintainable Embedded Linux Environment with Po...
 
Secure IOT Gateway
Secure IOT GatewaySecure IOT Gateway
Secure IOT Gateway
 
Trading Derivatives on Hyperledger
Trading Derivatives on HyperledgerTrading Derivatives on Hyperledger
Trading Derivatives on Hyperledger
 
Introducing Oracle Linux and Securing It With ksplice
Introducing Oracle Linux and Securing It With kspliceIntroducing Oracle Linux and Securing It With ksplice
Introducing Oracle Linux and Securing It With ksplice
 
Containers: Don't Skeu Them Up, Use Microservices Instead
Containers: Don't Skeu Them Up, Use Microservices InsteadContainers: Don't Skeu Them Up, Use Microservices Instead
Containers: Don't Skeu Them Up, Use Microservices Instead
 

Dernier

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Dernier (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

SR-IOV ixgbe Driver Limitations and Improvement

  • 1. SR-IOV ixgbe driver limitations and improvement 2016-07-15 Hiroshi Shimamoto
  • 2.
  • 3. 3 © NEC Corporation 2015 NEC Group Internal Use Only3 © NEC Corporation 2016 Who am I Hiroshi Shimamoto ▌Working for NEC ▌Main activities in open source community Carrier Grade Linux Virtualization Packet Processing, DPDK to Make Linux system usable for Telecom Carriers
  • 4. Outline SR-IOV and ixgbe implementation SR-IOV ixgbe limitations for NFV Addressing the issues Future work and possible security issues
  • 5. 5 © NEC Corporation 2015 NEC Group Internal Use Only5 © NEC Corporation 2016 ▌What is SR-IOV One of device virtualization technology Most of people here may know it ▌In SR-IOV enabled PCI device provides PF (Physical Function) and VF (Virtual Function) VF will be used directly in VM PCI device SR-IOV VF 0 PF VF 1 VF 2 ... VM VM VM PCI pass through
  • 6. 6 © NEC Corporation 2015 NEC Group Internal Use Only6 © NEC Corporation 2016 Why SR-IOV? ▌I/O in virtual machines is a performance bottleneck ▌Especially packet switching between VM and host ▌Technologies were introduced to address network performance PCI pass through SR-IOV DPDK
  • 7. 7 © NEC Corporation 2015 NEC Group Internal Use Only7 © NEC Corporation 2016 SR-IOV and etc. comparison ▌Brief pros and cons each technology PCI Pass through SR-IOV DPDK pros Hardware performance Multiple VMs Hardware performance Multiple VMs Software flexibility No special driver in guest cons Single VM Device support VF driver in guest Consume much CPU power
  • 8. 8 © NEC Corporation 2015 NEC Group Internal Use Only8 © NEC Corporation 2016 How SR-IOV implemented in Intel 82599 ▌Target device: Intel 82599 10GbE Controller and ixgbe driver for PF and ixgbevf driver for VF ▌There are 64 VMDq (Virtual Machine Device queue) in 82599 Calls this queue pool For SR-IOV, each VF has associated pool 82599 switches a packet to pool (VF) ▌The current ixgbe driver map VF pool 0 0 1 1 : : N N PF N+1 Intel 82599 10GbE controller datasheet URL http://www.intel.com/content/www/us/en/embedded/products/networking/82599-10-gbe-controller-datasheet.html
  • 9. 9 © NEC Corporation 2015 NEC Group Internal Use Only9 © NEC Corporation 2016 Packet switching in Intel 82599 ▌In SR-IOV mode ▌82599 chip switches a packet to corresponding pool ref. datasheet 7.10.3 Packet Switching 82599 NIC VF 0 PFVF 1 VF 2 ... Packet Switch Rx Tx pools
  • 10. 10 © NEC Corporation 2015 NEC Group Internal Use Only10 © NEC Corporation 2016 Rx packet switching ▌Receive a packet from outside ▌Step 1. Exact unicast or multicast match 2. Broadcast 3. Unicast hash* 4. Multicast hash 5. Multicast promiscuous* 6. VLAN group 7. Default pool 8. Ethertype filters 9. PFVFRE 10.Mirroring* VF 0 PFVF 1 VF 2 ... Packet Switch Note* driver didn’t support these features ref. datasheet 7.10.3.3 Rx Packet Switching
  • 11. 11 © NEC Corporation 2015 NEC Group Internal Use Only11 © NEC Corporation 2016 Tx packet switching ▌Receive a packet from device (PF and/or VF) ▌Step 1. Exact unicast or multicast match 2. Broadcast 3. Unicast hash* 4. Multicast hash 5. Multicast promiscuous* 6. Filer source pool 7. VLAN groups 8. Forwarding to the network 9. PFVFRE 10.Mirroring* 11.PFVFRE VF 0 PFVF 1 VF 2 ... Packet Switch Note* driver didn’t support these features ref. datasheet 7.10.3.4 Tx Packet Switching
  • 12. 12 © NEC Corporation 2015 NEC Group Internal Use Only12 © NEC Corporation 2016 Use Intel 82599 NIC SR-IOV as switch ▌Typical use case Intel 82599 NIC PFVF 2 ... Virtualization VM0 Guest OS VF driver VF 0 NIC VM1 Guest OS VF 1 VF driver NIC switch
  • 13. 13 © NEC Corporation 2015 NEC Group Internal Use Only13 © NEC Corporation 2016 NFV (Network Function Virtualization) ▌Using virtualization technology, implements Network Functions on general purpose hardware, to improve flexibility, agility and efficiency ▌Network Function is virtualized  VNF (Virtual Network Function) is realized on VM Virtualization Layer Orchestration/Resource Management IMS VM VM EPC VM VM L3SW VM VM GW VM VM Data Base VM VM App Server VM VM IMS EPC L3SW GW Data Base App Server Dedicated/Specific Hardware General Purpose Hardware
  • 14. 14 © NEC Corporation 2015 NEC Group Internal Use Only14 © NEC Corporation 2016 SR-IOV ixgbe driver limitations for NFV ▌Using Intel 82599 and SR-IOV, 3 critical limitations for NFV  VLAN filtering  Multicast addresses  Unicast promiscuous Come from hardware limitation and software (driver) limitation ▌Explain with 2 use cases Router Layer 2 switch Intel 82599 NIC PFVF 2 ... Virtualization VNF Guest OS VF driver VF 0 NIC VNF Guest OS VF 1 VF driver NIC switch NFV use case VNF is run on VM
  • 15. 15 © NEC Corporation 2015 NEC Group Internal Use Only15 © NEC Corporation 2016 NFV use case 1 ▌Network Function: Router VLAN is used to virtualize the network Access NetworkUser Equipment Private Network VNF VLAN:1 VLAN:2 VLAN:3 VLAN:N ... Terminates many VLANs and IPs ...
  • 16. 16 © NEC Corporation 2015 NEC Group Internal Use Only16 © NEC Corporation 2016 VLAN filtering limitation ▌Intel 82599 has hardware VLAN filter ▌The problem comes from  VLAN filter has only 64 entries  Driver always enable VLAN filter if SR-IOV enabled  Only 64 VLANs can be used with SR-IOV
  • 17. 17 © NEC Corporation 2015 NEC Group Internal Use Only17 © NEC Corporation 2016 Example: VF device returns error against adding VLAN ▌Try to create 64 VLANs on VM # for i in `seq 1 64`; do > echo "vlan $i" > ip link add link ens6 name ens6.$i type vlan id $i > done vlan 1 vlan 2 : vlan 63 vlan 64 RTNETLINK answers: Permission denied Note The ixgbe driver use the first entry for VLAN 0, that means actual the number of VLANs is 63
  • 18. 18 © NEC Corporation 2015 NEC Group Internal Use Only18 © NEC Corporation 2016 Enabling VLAN filter automatically in driver (latest) ▌upstream static void ixgbe_vlan_promisc_enable(struct ixgbe_adapter *adapter) { : switch (hw->mac.type) { case ixgbe_mac_82599EB: case ixgbe_mac_X540: case ixgbe_mac_X550: case ixgbe_mac_X550EM_x: case ixgbe_mac_x550em_a: default: if (adapter->flags & IXGBE_FLAG_VMDQ_ENABLED) break; /* fall through */ case ixgbe_mac_82598EB: /* legacy case, we can just disable VLAN filtering */ vlnctrl = IXGBE_READ_REG(hw, IXGBE_VLNCTRL); vlnctrl &= ~(IXGBE_VLNCTRL_VFE | IXGBE_VLNCTRL_CFIEN); IXGBE_WRITE_REG(hw, IXGBE_VLNCTRL, vlnctrl); return; }
  • 19. 19 © NEC Corporation 2015 NEC Group Internal Use Only19 © NEC Corporation 2016 Enabling VLAN filter automatically in driver (old) ▌previous version void ixgbe_set_rx_mode(struct net_device *netdev) { : : vlnctrl &= ~(IXGBE_VLNCTRL_VFE | IXGBE_VLNCTRL_CFIEN); if (netdev->flags & IFF_PROMISC) { hw->addr_ctrl.user_set_promisc = true; fctrl |= (IXGBE_FCTRL_UPE | IXGBE_FCTRL_MPE); vmolr |= IXGBE_VMOLR_MPE; /* Only disable hardware filter vlans in promiscuous mode * if SR-IOV and VMDQ are disabled - otherwise ensure * that hardware VLAN filters remain enabled. */ if (adapter->flags & (IXGBE_FLAG_VMDQ_ENABLED | IXGBE_FLAG_SRIOV_ENABLED)) vlnctrl |= (IXGBE_VLNCTRL_VFE | IXGBE_VLNCTRL_CFIEN);
  • 20. 20 © NEC Corporation 2015 NEC Group Internal Use Only20 © NEC Corporation 2016 Multicast address limitation ▌The interface (PF-VF mailbox API) limits the number of addresses ▌Only first 30 multicast addresses can be registered ▌Overflowed addresses are silently dropped  Having Multicast promiscuous is a solution /* Each entry in the list uses 1 16 bit word. We have 30 * 16 bit words available in our HW msg buffer (minus 1 for the * msg type). That's 30 hash values if we pack 'em right. If * there are more than 30 MC addresses to add then punt the * extras for now and then add code to handle more than 30 later. * It would be unusual for a server to request that many multi-cast * addresses except for in large enterprise network environments. */ cnt = netdev_mc_count(netdev); if (cnt > 30) cnt = 30; msgbuf[0] = IXGBE_VF_SET_MULTICAST; msgbuf[0] |= cnt << IXGBE_VT_MSGINFO_SHIFT;
  • 21. 21 © NEC Corporation 2015 NEC Group Internal Use Only21 © NEC Corporation 2016 Why so many multicast addresses? ▌Our case is to support many IPv6 addresses on VF For Neighbor Discovery, each IPv6 address requires corresponding multicast address Unicast address Multicast MAC 2001:0000 0000:0000 0000:0000 00 12:3456 ff02:0000 0000:0000 0000:0001 ff 12:3456 33:33 ff:12:34:56 Solicited node multicast address
  • 22. 22 © NEC Corporation 2015 NEC Group Internal Use Only22 © NEC Corporation 2016 Example: Cannot receive multicast packet ▌Some multicast packets are not passed to VF, then failed to ND ▌Assign 30 IPv6 addresses on VM (different network address) ▌Ping from other physical machine to VM # for i in `seq 1 30`; do > ip -6 addr add 2001:$i::1:$i/64 dev ens6 > done # for i in `seq 1 30`; do ping6 –w 1 –c 1 –I eth2 2001:$i::1:$i; done : PING 2001:28::1:28(2001:28::1:28) from 2001:28::1 eth2: 56 data bytes --- 2001:28::1:28 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1000ms
  • 23. 23 © NEC Corporation 2015 NEC Group Internal Use Only23 © NEC Corporation 2016 NFV use case 2 ▌Network Function: L2 switch Access NetworkUser Equipment Private Network VNF ... MAC:A MAC:B Left to right both of MAC:A and MAC:B must be able to receive through this VF dst:A dst:B
  • 24. 24 © NEC Corporation 2015 NEC Group Internal Use Only24 © NEC Corporation 2016 Unicast promiscuous ▌Only packets which match the registered MAC can be received ▌VNF should be able to handle every MAC addresses in network ▌Hard to register all MAC addresses in L2 network  We want Unicast promiscuous feature in VF
  • 25. 25 © NEC Corporation 2015 NEC Group Internal Use Only25 © NEC Corporation 2016 SR-IOV limitations for NFV and current status ▌VLAN filtering Only 64 VLANs can be used  Proposed to add an option to disable hardware VLAN filter, but not accepted ▌Multicast addresses Only 30 Multicast addresses can be used  Implemented VF Multicast promiscuous mode in ixgbe/ixgbevf driver in Linux 4.4 ▌Unicast promiscuous Single unicast MAC address can be used  Hardware limitation
  • 26. 26 © NEC Corporation 2015 NEC Group Internal Use Only26 © NEC Corporation 2016 Addressing Multicast addresses limitation ▌There is a hardware feature in 82599 VF Multicast promiscuous mode ▌But driver didn’t support that feature ▌Implement new PF-VF mailbox API in ixgbe and ixgbevf ▌First, automatically enable to VF multicast promiscuous when the number of addresses overs 30 ▌Suggested way, implement xcast mode in VF There is ALLMULTI flag that means that every multicast packet is received in this device  Accepted in Linux 4.4
  • 27. 27 © NEC Corporation 2015 NEC Group Internal Use Only27 © NEC Corporation 2016 PF-VF mailbox APIs ▌There are mailbox APIs Communicate between PF and VF version API description legacy RESET VF requests reset SET_MAC_ADDR VF requests PF to set MAC addr SET_MULTICAST VF requests PF to set MC addr SET_VLAN VF requests PF to set VLAN 1.0 SET_LPE VF requests PF to set VMOLR.LPE SET_MACVLAN VF requests PF for unicast filter API_NEGOTIATE negotiate API version 1.1 GET_QUEUES get queue configuration 1.2 GET_RETA VF requests for RETA GET_RSS_KEY get RSS key UPDATE_XCAST_MODE VF requests PF to set MC mode
  • 28. 28 © NEC Corporation 2015 NEC Group Internal Use Only28 © NEC Corporation 2016 Implementation (PF ixgbe) ▌Handle UPDATE_XCAST_MODE API --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c : @@ -1066,6 +1122,9 @@ static int ixgbe_rcv_msg_from_vf(struct ixgbe_adapter *adapter, u32 vf) case IXGBE_VF_GET_RSS_KEY: retval = ixgbe_get_vf_rss_key(adapter, msgbuf, vf); break; + case IXGBE_VF_UPDATE_XCAST_MODE: + retval = ixgbe_update_vf_xcast_mode(adapter, msgbuf, vf); + break; default: e_err(drv, "Unhandled Msg %8.8x¥n", msgbuf[0]); retval = IXGBE_ERR_MBX;
  • 29. 29 © NEC Corporation 2015 NEC Group Internal Use Only29 © NEC Corporation 2016 Implementation (VF ixgbevf) ▌Request to PF (from ixgbevf_set_rx_mode) --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c @@ -1894,9 +1894,17 @@ static void ixgbevf_set_rx_mode(struct net_device *netdev) { struct ixgbevf_adapter *adapter = netdev_priv(netdev); struct ixgbe_hw *hw = &adapter->hw; + unsigned int flags = netdev->flags; + int xcast_mode; + + xcast_mode = (flags & IFF_ALLMULTI) ? IXGBEVF_XCAST_MODE_ALLMULTI : + (flags & (IFF_BROADCAST | IFF_MULTICAST)) ? + IXGBEVF_XCAST_MODE_MULTI : IXGBEVF_XCAST_MODE_NONE; spin_lock_bh(&adapter->mbx_lock); + hw->mac.ops.update_xcast_mode(hw, netdev, xcast_mode); + /* reprogram multicast list */ hw->mac.ops.update_mc_addr_list(hw, netdev);
  • 30. 30 © NEC Corporation 2015 NEC Group Internal Use Only30 © NEC Corporation 2016 Security consideration ▌Enabling VF Multicast promiscuous mode causes security issues  Can see all multicast packets through this device  Can hurt performance NIC duplicates packets and does DMA to each pool  Make only trusted VF can enable Multicast promiscuous mode static int ixgbe_update_vf_xcast_mode(struct ixgbe_adapter *adapter, u32 *msgbuf, u32 vf) { : if (xcast_mode > IXGBEVF_XCAST_MODE_MULTI && !adapter->vfinfo[vf].trusted) { xcast_mode = IXGBEVF_XCAST_MODE_MULTI; }
  • 31. 31 © NEC Corporation 2015 NEC Group Internal Use Only31 © NEC Corporation 2016 Implement functionality to trust VF ▌Add new operation “set_vf_trust” in net_device_ops ▌Also add support VF trust operation in iproute2 (ip command) # ip link set dev enp3s0f0 vf 1 trust on (dmesg) kernel: ixgbe 0000:03:00.0 enp3s0f0: VF 1 is trusted kernel: ixgbevf 0000:03:10.2: NIC Link is Down kernel: ixgbe 0000:03:00.0 enp3s0f0: VF Reset msg received from vf 1 kernel: ixgbevf 0000:03:10.2: NIC Link is Up 10 Gbps # ip link set dev enp3s0f0 vf 1 trust off (dmesg) kernel: ixgbe 0000:03:00.0 enp3s0f0: VF 1 is not trusted kernel: ixgbevf 0000:03:10.2: NIC Link is Down kernel: ixgbe 0000:03:00.0 enp3s0f0: VF Reset msg received from vf 1 kernel: ixgbevf 0000:03:10.2: NIC Link is Up 10 Gbps Note When trusted state is changed, target vf is reset
  • 32. 32 © NEC Corporation 2015 NEC Group Internal Use Only32 © NEC Corporation 2016 Future work and possible security issues ▌Still, there are limitations  VLAN filtering  Unicast promiscuous ▌Possible security issues  VLAN filter is not isolated  Multicast hash table is not handled strictly
  • 33. 33 © NEC Corporation 2015 NEC Group Internal Use Only33 © NEC Corporation 2016 VLAN filtering ▌Disabling hardware VLAN filtering may solve this issue ▌But  It could break existing network feature (DCB, FCoE)  Broadcast(Multicast) storm could cause performance degradation  Security issue, BC/MC packets can be seen in every VF ▌Maybe okay if the network and VMs are well managed ▌Another point is that there is no suitable knob to do it now What command is right to turn VLAN filter off in general
  • 34. 34 © NEC Corporation 2015 NEC Group Internal Use Only34 © NEC Corporation 2016 Unicast promiscuous ▌Supporting this feature in hardware is the best ▌No VF unicast promiscuous feature in 82599 unfortunately
  • 35. 35 © NEC Corporation 2015 NEC Group Internal Use Only35 © NEC Corporation 2016 Hardware support in X540 and X550 ▌Later NIC chips, X540 and X550 support VLAN promiscuous and Unicast promiscuous mode per VF ▌The issues could be solved with X540/X550 chip ▌To make framework for X540/X550 and use the same semantics for 82599 may be needed Field Bit(s) Description Reserved 23:0 21:0 Reserved UPE 22 Unicast Promiscuous Enable VPE 23 VLAN Promiscuous Enable AUPE 24 Accept Untagged Packet Enable ROMPE 25 Receive Overflow Multicast Packets ROPE 26 Receive MAC Filters Overflow BAM 27 Broadcast Accept MPE 28 Multicast Promiscuous Enable Reserved 31:29 Reserved PF VM L2Control Register new feature bit
  • 36. 36 © NEC Corporation 2015 NEC Group Internal Use Only36 © NEC Corporation 2016 Ideas ▌Mirroring ▌Unicast hash Those features are not used/supported in driver
  • 37. 37 © NEC Corporation 2015 NEC Group Internal Use Only37 © NEC Corporation 2016 Possible security issues ▌In the current ixgbe/ixgbevf implementation, there are 2 issues to be considered  VLAN filter is not isolated Single VLAN filter table No limitation to request new VLAN from VF If a VM requests 64 VLANs, other VMs never use different VLANs  Single multicast hash table Using multicast hash table for switching multicast packet to VF SET_MULTICAST API is only for setting hash value, no unset functionality Manipulating IP address assignment can make VF to have multicast promiscuous behavior
  • 38. 38 © NEC Corporation 2015 NEC Group Internal Use Only38 © NEC Corporation 2016 VLAN filter is not isolated ▌If a VM uses all VLANs, other VM can’t make new VLAN (VM0) # for i in `seq 1 64`; do > echo "vlan $i" > ip link add link ens6 name ens6.$i type vlan id $i > done vlan 1 vlan 2 : vlan 63 vlan 64 RTNETLINK answers: Permission denied (VM1) # ip link add link ens6 name ens6.64 type vlan id 64 RTNETLINK answers: Permission denied and if VM does not shutdown gracefully, registered VLAN filter entry remains
  • 39. 39 © NEC Corporation 2015 NEC Group Internal Use Only39 © NEC Corporation 2016 Single multicast hash table ▌ixgbe driver uses multicast hash table (MTA register) ▌Multicast Table Array is a 4Kb bitmap i. 82599 make 12 bits hash value from MAC ii. If corresponding bit in MTA is set, it means hit iii. Check VF capability PFVFL2FLT.ROMPE and transfer packet  If every bits in MTA are set, every multicast packet matches and transferred to pool ▌MTA bit is set by SET_MULTICAST API and no unset API ▌PFVFL2FLT.ROMPE bit always set by receiving SET_MULTICAST API message ▌Long living host may have unnecessary bits in MTA
  • 40. 40 © NEC Corporation 2015 NEC Group Internal Use Only40 © NEC Corporation 2016 Example: Can receive all multicast packet ▌Invoke SET_MULTICAST from VF each IP address ▌Assign 30 IP addresses again ▌Ping from other physical machine to VM # for i in `seq 1 30`; do > ip -6 addr add 2001:$i::1:$i/64 dev ens6 > ip -6 addr del 2001:$i::1:$i/64 dev ens6 > done # for i in `seq 1 30`; do ping6 –w 1 –c 1 –I eth2 2001:$i::1:$i; done : PING 2001:30::1:30(2001:30::1:30) from 2001:30::1 eth2: 56 data bytes 64 bytes from 2001:30::1:30: icmp_seq=1 ttl=64 time=0.947 ms --- 2001:30::1:30 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.947/0.947/0.947/0.000 ms # for i in `seq 1 30`; do > ip -6 addr add 2001:$i::1:$i/64 dev ens6 > done
  • 41. 41 © NEC Corporation 2015 NEC Group Internal Use Only41 © NEC Corporation 2016 Summary ▌SR-IOV in Intel 82599 and ixgbe driver implementation ▌SR-IOV ixgbe driver limitations for NFV VLAN filtering Multicast addresses Unicast promiscuous ▌Addressing Multicast addresses issue Implement new mailbox API and ndo VF trust ▌Future work and possible security issues ▌Questions? ▌E-Mail: h-shimamoto@ct.jp.nec.com