SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
Lendo e apresentando:
Forwarding Metamorphosis:
Fast Programmable Match-
Action Processing in Hardware
for SDN
Bruno Castelucci – 2013.10.31
1
References:
Forwarding Metamorphosis:
Fast Programmable Match-Action
Processing in Hardware for SDN
(Pat Bosshart, Glen Gibb, Hun-Seok Kim,
George Varghese, Nick McKeown, Martin Izzard,
Fernando Mujica, Mark Horowitz)
SIGCOMM - 2013.08.13
Reading:
Forwarding Metamorphosis: Fast Programmable
Match-Action Processing in Hardware for SDN
(Ji Yang)
2013.09.23
2
Objetivos
• O Papper descreve a implementação de um
Switch de 64 ports de 10Gbps usando o modelo
RMT (Reconfigurable Match Tables).
• Desenhar uma arquitetura para o RMT.
• Mostrar alguns casos de uso.
3
Exemplo: Switch padrão
4
Deparser
In
Queues
Data
OutACL
Stage
L3
Stage
L2
Stage
Action:setL2D
Stage 1
L2Table
L2: 128k x 48
Exact match
Action:setL2D,decTTL
Stage 2
L3Table
L3: 16k x 32
Longest prefix
match
Action:permit/deny
Stage 3ACLTable
ACL: 4k
Ternary match
Parser
X X X X X
Access Control List (ACL)
Time to live (TTL)
Forwarding Plane is Challenged
• Current Hardware switches are rigid
▫ Match Action can process limited set fields
• OpenFlow/SDN protocol defines a limited
repertoire of packet processing actions
▫ Updated frequently
• Hardware still has its vantages
▫ Speed: Faster than CPU
▫ Pipeline, parallelism
What if you need flexibility?
• Flexibility to:
▫ Trade one memory size for another
▫ Add a new table
▫ Add a new header field
▫ Add a different action
• SDN accentuates the need for flexibility
▫ Gives programmatic control to control plane, expects
to be able to use flexibility
 Multiple stages of match-action
 Flexible actions
 Flexible header fields
6
What about Alternatives?
Aren’t there other ways to get
flexibility?
• Software? 100x too slow, expensive
• NPUs? 10x too slow, expensive
• FPGAs? 10x too slow, expensive
7
What’s Hard about a
Flexible Switch Chip?
• Big chip
• Wiring intensive
• Many crossbars
• Lots of TCAM
• Operate in high frequency – 1TBps
8
The RMT Model
9
Hardware Architectures for SDN
• Single Match Table
▫ Easier to implement.
▫ Needs to store every combination of headers: too big.
• Multiple Match Tables
▫ Allows multiple smaller match tables to be matched by a subset of packet fields.
▫ Number of and size of are limited: hard to optimize and add new behaviors
▫ Very few actions implemented today on current chips.
• Reconfigurable Match Tables
▫ Work under pipeline
▫ New fields added easily
▫ Number, topology, widths, depths of match tables can be configured
▫ New actions defined easily
▫ Packets can be placed in specified queues (queuing discipline specified)
• RMT permite que o plano de encaminhamento seja alterado "on the run" sem
modificar o hardware.
Como no Openflow com MMT, o programador pode especificar match tables,
limitado somente ao tamanho do hardware, porém, o RMT permite modificar
todos os header fields do pacote com mais flexibilidade.
RMT Consists of:
11
•Reconfigurable parser
•Support field definition
•Resizable match
•VLIW: very long instruction word
•Configurable output queue
How is it configured?
RMT Parse Graph and Table Graph
12
Compiler and control protocol not available yet.
But the Parse Graph and Table
Graph don’t show you how to build
a switch
13
Performance vs Flexibility
• Multiprocessor: memory bottleneck
• Change to pipeline
• Fixed function chips specialize processors
• Flexible switch needs general purpose CPUs
14
Memory
Memory
Memory CPU
CPU
CPU
L2 L3 ACL
Access Control List (ACL)
Memory
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
How We Did It
• Memory to CPU bottleneck
• Replicate CPUs
• More stages for finer granularity
• Higher CPU cost ok
15
C
P
U
C
P
U
C
P
U
Antes, definições de memória
• RAM –
▫ binário (0, 1), busca em loop por endereços, ou hash
table, busca lenta.
• CAM – Content Addressable Memory –
▫ binário, associative memory, o conteudo é buscado
em todos os campos ao mesmo tempo, busca rapida.
• TCAM – Ternary CAM –
▫ ternária (0, 1, X), wildcard match, busca rápida. Prefix
matching.
16
RMT Consists of:
17
Parser
18
•Recebe o pacote de produz o header vector de 4kb.
•O parse é feito baseado num gráfico provido pelo usuário convertido em
entradas numa TCAM de 256 x 40b entradas.
•Cada interação tem um tamanho de campo de header específico e gera um
campo no vetor de saida, o parser volta ao inicio para tratar o próximo
campo.
•O processo de parse é feito sequencialmente até que todo o header seja
trabalhado, e todo o vector header esteja pronto.
32x Match
Stages
• Cada estágio fisico tem memórias
SRAM e TCAM e Action CPUs
associadas independentes.
▫ Cada physical match stage tem
subunidades de SRAM (8x 80b) 640b e
de TCAM (16x 40b) 640b.
• Cada subunidade pode trabalhar:
▫ independentemente,
▫ em grupos para maiores campos
maiores,
▫ ou em conjuntos para tabelas mais
extensas.
• Cada match stage tem adicionais 106
RAM Blocks de 1k x 112b e 16 blocos de
TCAM de 2k x 40b.
▫ A fração configurada para memórias de
match, action e estatistica é configurável.
▫ Todas as leituras são realizadas em
paralelo.
• Em cada entrada de match table na
RAM está associado um ponteiro para a
action memory e size, instruction
memory e um next table address.
19
TCAM
640b
640b
Physical
Stage n
Physical
Stage 2
Logical
Table 1
Ethertype
Logical Table 6
L2D
8 UDP
2
VLA
N
3
IPV4
5
IPV6
4
L2S
7 TCP
SRAM
HASH
Physical
Stage 1
RMT Logical to Physical Table Mapping
20
ACL
UDPTCP
L2S
L2D
IPV4
ETH
VLAN
IPV6
9 ACL
Table Graph
Action
MatchTable
Action
MatchTable
Action
MatchTable
Access Control List (ACL)
Instruction
ALU
Match result
Action Processing Model
21
HeaderIn
FieldField
Data
HeaderOut
ALU
VLIW Instructions
Match result
Modeled as Multiple VLIW CPUs per Stage
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
22
VLIW Instructions
23
Reducing Latency
• Dependency Analysis
Hardware Implementation
• 64 x 10Gb ports
▫ 960M packets/second
▫ 1GHz pipeline
• Packet header Parser
▫ 4Kb vector
▫ 256*40b TCAM,
• Physical Stages N=32
▫ 106 blocks SRAM with 1K*112bits: for
80bits width Hash
▫ 16 blocks TCAM with 2K*40bits
▫ Total:370Mb SRAM, 40Mb TCAM,
support combine in depth or width
• Action units
– 200+ for each stage
– Each for one field
– 7k+ units total
• Queues
– 2K queues per port
– 4 levels of hierarchy
– Allowing various combinations of
deficit round robin, hierarchical fair
queuing, token buckets, priorities.
Cost of Configurability:
Comparison with Conventional Switch
• Many functions identical: I/O, data buffer, queueing…
• Make extra functions optional: statistics
• Memory dominates area
▫ Compare memory area/bit and bit count
• RMT must use memory bits efficiently to compete on
cost
• Techniques for flexibility
26
Chip Comparison with Fixed Function Switches
Section Area % of chip Extra Cost
IO, buffer, queue, CPU, etc 37% 0.0%
Match memory & logic 54.3% 8.0%
VLIW action engine 7.4% 5.5%
Parser + deparser 1.3% 0.7%
Total extra area cost 14.2%
27
Section Power % of
chip
Extra Cost
I/O 26.0% 0.0%
Memory leakage 43.7% 4.0%
Logic leakage 7.3% 2.5%
RAM active 2.7% 0.4%
TCAM active 3.5% 0.0%
Logic active 16.8% 5.5%
Total extra power cost 12.4%
Area
Power
Conclusion
• How do we design a flexible chip?
▫ The RMT switch model
▫ Bring processing close to the memories:
 pipeline of many stages
▫ Bring the processing to the wires:
 224 action CPUs per stage
• How much does it cost?
▫ 15% more than existing switches
28

Contenu connexe

Tendances

Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Deepak Shankar
 

Tendances (20)

CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
Qemu Pcie
Qemu PcieQemu Pcie
Qemu Pcie
 
Network attached storage (nas)
Network attached storage (nas)Network attached storage (nas)
Network attached storage (nas)
 
NVMe over Fabric
NVMe over FabricNVMe over Fabric
NVMe over Fabric
 
Presentation on nfs,afs,vfs
Presentation on nfs,afs,vfsPresentation on nfs,afs,vfs
Presentation on nfs,afs,vfs
 
Receive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-netReceive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-net
 
PCI Express* based Storage: Data Center NVM Express* Platform Topologies
PCI Express* based Storage: Data Center NVM Express* Platform TopologiesPCI Express* based Storage: Data Center NVM Express* Platform Topologies
PCI Express* based Storage: Data Center NVM Express* Platform Topologies
 
RBD: What will the future bring? - Jason Dillaman
RBD: What will the future bring? - Jason DillamanRBD: What will the future bring? - Jason Dillaman
RBD: What will the future bring? - Jason Dillaman
 
Parallel Computing on the GPU
Parallel Computing on the GPUParallel Computing on the GPU
Parallel Computing on the GPU
 
Bus Standards and Networking
Bus Standards and NetworkingBus Standards and Networking
Bus Standards and Networking
 
14 superscalar
14 superscalar14 superscalar
14 superscalar
 
Memory management
Memory managementMemory management
Memory management
 
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
 
Win idea manual
Win idea manualWin idea manual
Win idea manual
 
Tổng quan công nghệ Net backup - Phần 2
Tổng quan công nghệ Net backup - Phần 2Tổng quan công nghệ Net backup - Phần 2
Tổng quan công nghệ Net backup - Phần 2
 
EMC Vipr srm-technical Deep dive
EMC Vipr srm-technical Deep diveEMC Vipr srm-technical Deep dive
EMC Vipr srm-technical Deep dive
 
AMD processors
AMD processorsAMD processors
AMD processors
 
Understanding das-nas-san
Understanding das-nas-sanUnderstanding das-nas-san
Understanding das-nas-san
 
INTERRUPT LATENCY AND RESPONSE OF THE TASK
INTERRUPT LATENCY AND RESPONSE OF THE TASKINTERRUPT LATENCY AND RESPONSE OF THE TASK
INTERRUPT LATENCY AND RESPONSE OF THE TASK
 
DPDK
DPDKDPDK
DPDK
 

Similaire à Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...
Jorge E. López de Vergara Méndez
 
osdi20-slides_zhao.pptx
osdi20-slides_zhao.pptxosdi20-slides_zhao.pptx
osdi20-slides_zhao.pptx
Cive1971
 
Motivation for multithreaded architectures
Motivation for multithreaded architecturesMotivation for multithreaded architectures
Motivation for multithreaded architectures
Young Alista
 

Similaire à Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN (20)

lect13_programmable_dp.pptx
lect13_programmable_dp.pptxlect13_programmable_dp.pptx
lect13_programmable_dp.pptx
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptx
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
Theta and the Future of Accelerator Programming
Theta and the Future of Accelerator ProgrammingTheta and the Future of Accelerator Programming
Theta and the Future of Accelerator Programming
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles Shiflett
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and Virtualization
 
On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...
 
osdi20-slides_zhao.pptx
osdi20-slides_zhao.pptxosdi20-slides_zhao.pptx
osdi20-slides_zhao.pptx
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 
Designing for High Performance Ceph at Scale
Designing for High Performance Ceph at ScaleDesigning for High Performance Ceph at Scale
Designing for High Performance Ceph at Scale
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
[OpenStack Days Korea 2016] Track3 - OpenStack on 64-bit ARM with X-Gene
 
Motivation for multithreaded architectures
Motivation for multithreaded architecturesMotivation for multithreaded architectures
Motivation for multithreaded architectures
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud Datacenters
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 
cachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Cachingcachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Caching
 
Brkdct 3101
Brkdct 3101Brkdct 3101
Brkdct 3101
 
Cpu Caches
Cpu CachesCpu Caches
Cpu Caches
 

Dernier

notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Dernier (20)

notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 

Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN

  • 1. Lendo e apresentando: Forwarding Metamorphosis: Fast Programmable Match- Action Processing in Hardware for SDN Bruno Castelucci – 2013.10.31 1
  • 2. References: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN (Pat Bosshart, Glen Gibb, Hun-Seok Kim, George Varghese, Nick McKeown, Martin Izzard, Fernando Mujica, Mark Horowitz) SIGCOMM - 2013.08.13 Reading: Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN (Ji Yang) 2013.09.23 2
  • 3. Objetivos • O Papper descreve a implementação de um Switch de 64 ports de 10Gbps usando o modelo RMT (Reconfigurable Match Tables). • Desenhar uma arquitetura para o RMT. • Mostrar alguns casos de uso. 3
  • 4. Exemplo: Switch padrão 4 Deparser In Queues Data OutACL Stage L3 Stage L2 Stage Action:setL2D Stage 1 L2Table L2: 128k x 48 Exact match Action:setL2D,decTTL Stage 2 L3Table L3: 16k x 32 Longest prefix match Action:permit/deny Stage 3ACLTable ACL: 4k Ternary match Parser X X X X X Access Control List (ACL) Time to live (TTL)
  • 5. Forwarding Plane is Challenged • Current Hardware switches are rigid ▫ Match Action can process limited set fields • OpenFlow/SDN protocol defines a limited repertoire of packet processing actions ▫ Updated frequently • Hardware still has its vantages ▫ Speed: Faster than CPU ▫ Pipeline, parallelism
  • 6. What if you need flexibility? • Flexibility to: ▫ Trade one memory size for another ▫ Add a new table ▫ Add a new header field ▫ Add a different action • SDN accentuates the need for flexibility ▫ Gives programmatic control to control plane, expects to be able to use flexibility  Multiple stages of match-action  Flexible actions  Flexible header fields 6
  • 7. What about Alternatives? Aren’t there other ways to get flexibility? • Software? 100x too slow, expensive • NPUs? 10x too slow, expensive • FPGAs? 10x too slow, expensive 7
  • 8. What’s Hard about a Flexible Switch Chip? • Big chip • Wiring intensive • Many crossbars • Lots of TCAM • Operate in high frequency – 1TBps 8
  • 10. Hardware Architectures for SDN • Single Match Table ▫ Easier to implement. ▫ Needs to store every combination of headers: too big. • Multiple Match Tables ▫ Allows multiple smaller match tables to be matched by a subset of packet fields. ▫ Number of and size of are limited: hard to optimize and add new behaviors ▫ Very few actions implemented today on current chips. • Reconfigurable Match Tables ▫ Work under pipeline ▫ New fields added easily ▫ Number, topology, widths, depths of match tables can be configured ▫ New actions defined easily ▫ Packets can be placed in specified queues (queuing discipline specified) • RMT permite que o plano de encaminhamento seja alterado "on the run" sem modificar o hardware. Como no Openflow com MMT, o programador pode especificar match tables, limitado somente ao tamanho do hardware, porém, o RMT permite modificar todos os header fields do pacote com mais flexibilidade.
  • 11. RMT Consists of: 11 •Reconfigurable parser •Support field definition •Resizable match •VLIW: very long instruction word •Configurable output queue
  • 12. How is it configured? RMT Parse Graph and Table Graph 12 Compiler and control protocol not available yet.
  • 13. But the Parse Graph and Table Graph don’t show you how to build a switch 13
  • 14. Performance vs Flexibility • Multiprocessor: memory bottleneck • Change to pipeline • Fixed function chips specialize processors • Flexible switch needs general purpose CPUs 14 Memory Memory Memory CPU CPU CPU L2 L3 ACL Access Control List (ACL)
  • 15. Memory C P U C P U C P U C P U C P U C P U C P U C P U C P U How We Did It • Memory to CPU bottleneck • Replicate CPUs • More stages for finer granularity • Higher CPU cost ok 15 C P U C P U C P U
  • 16. Antes, definições de memória • RAM – ▫ binário (0, 1), busca em loop por endereços, ou hash table, busca lenta. • CAM – Content Addressable Memory – ▫ binário, associative memory, o conteudo é buscado em todos os campos ao mesmo tempo, busca rapida. • TCAM – Ternary CAM – ▫ ternária (0, 1, X), wildcard match, busca rápida. Prefix matching. 16
  • 18. Parser 18 •Recebe o pacote de produz o header vector de 4kb. •O parse é feito baseado num gráfico provido pelo usuário convertido em entradas numa TCAM de 256 x 40b entradas. •Cada interação tem um tamanho de campo de header específico e gera um campo no vetor de saida, o parser volta ao inicio para tratar o próximo campo. •O processo de parse é feito sequencialmente até que todo o header seja trabalhado, e todo o vector header esteja pronto.
  • 19. 32x Match Stages • Cada estágio fisico tem memórias SRAM e TCAM e Action CPUs associadas independentes. ▫ Cada physical match stage tem subunidades de SRAM (8x 80b) 640b e de TCAM (16x 40b) 640b. • Cada subunidade pode trabalhar: ▫ independentemente, ▫ em grupos para maiores campos maiores, ▫ ou em conjuntos para tabelas mais extensas. • Cada match stage tem adicionais 106 RAM Blocks de 1k x 112b e 16 blocos de TCAM de 2k x 40b. ▫ A fração configurada para memórias de match, action e estatistica é configurável. ▫ Todas as leituras são realizadas em paralelo. • Em cada entrada de match table na RAM está associado um ponteiro para a action memory e size, instruction memory e um next table address. 19
  • 20. TCAM 640b 640b Physical Stage n Physical Stage 2 Logical Table 1 Ethertype Logical Table 6 L2D 8 UDP 2 VLA N 3 IPV4 5 IPV6 4 L2S 7 TCP SRAM HASH Physical Stage 1 RMT Logical to Physical Table Mapping 20 ACL UDPTCP L2S L2D IPV4 ETH VLAN IPV6 9 ACL Table Graph Action MatchTable Action MatchTable Action MatchTable Access Control List (ACL)
  • 21. Instruction ALU Match result Action Processing Model 21 HeaderIn FieldField Data HeaderOut
  • 22. ALU VLIW Instructions Match result Modeled as Multiple VLIW CPUs per Stage ALU ALU ALU ALU ALU ALU ALU ALU 22
  • 25. Hardware Implementation • 64 x 10Gb ports ▫ 960M packets/second ▫ 1GHz pipeline • Packet header Parser ▫ 4Kb vector ▫ 256*40b TCAM, • Physical Stages N=32 ▫ 106 blocks SRAM with 1K*112bits: for 80bits width Hash ▫ 16 blocks TCAM with 2K*40bits ▫ Total:370Mb SRAM, 40Mb TCAM, support combine in depth or width • Action units – 200+ for each stage – Each for one field – 7k+ units total • Queues – 2K queues per port – 4 levels of hierarchy – Allowing various combinations of deficit round robin, hierarchical fair queuing, token buckets, priorities.
  • 26. Cost of Configurability: Comparison with Conventional Switch • Many functions identical: I/O, data buffer, queueing… • Make extra functions optional: statistics • Memory dominates area ▫ Compare memory area/bit and bit count • RMT must use memory bits efficiently to compete on cost • Techniques for flexibility 26
  • 27. Chip Comparison with Fixed Function Switches Section Area % of chip Extra Cost IO, buffer, queue, CPU, etc 37% 0.0% Match memory & logic 54.3% 8.0% VLIW action engine 7.4% 5.5% Parser + deparser 1.3% 0.7% Total extra area cost 14.2% 27 Section Power % of chip Extra Cost I/O 26.0% 0.0% Memory leakage 43.7% 4.0% Logic leakage 7.3% 2.5% RAM active 2.7% 0.4% TCAM active 3.5% 0.0% Logic active 16.8% 5.5% Total extra power cost 12.4% Area Power
  • 28. Conclusion • How do we design a flexible chip? ▫ The RMT switch model ▫ Bring processing close to the memories:  pipeline of many stages ▫ Bring the processing to the wires:  224 action CPUs per stage • How much does it cost? ▫ 15% more than existing switches 28