DevoFlow - Scaling Flow Management for High-Performance Networks
1. DevoFlow: Scaling Flow
Management for High-Performance
Networks
Andrew R. Curtis (University of Waterloo); Jeffrey C.
Mogul, Jean Tourrilhes, Praveen Yalagandula, Puneet
Sharma, Sujata Banerjee (HP Labs), SIGCOMM 2011
Presenter: Jason, Tsung-Cheng, HOU
Advisor: Wanjiun Liao
Mar. 22nd, 2012 1
2. Motivation
• SDN / OpenFlow can enable per-flow
management… However…
• What are the costs and limitations?
• Network-wide logical graph
= always collecting all flows’ stat.s?
• Any more problems beyond controller’s
scalability?
• Enhancing performance / scalability of
controllers solves all problems?
2
3. DevoFlow Contributions
• Characterize overheads of implementing
OpenFlow on switches
• Evaluate flow mgmt capability within data
center network environment
• Propose DevoFlow to enable scalable flow
mgmt by balancing
– Network control
– Statistics collection
– Overheads
– Switch functions and controller loads
3
4. Agenda
• OF Benefits, Bottlenecks, and Dilemmas
• Evaluation of Overheads
• DevoFlow
• Simulation Results
4
5. Benefits
• Flexible policies w/o switch-by-sw config.
• Network graph and visibility, stat.s collection
• Enable traffic engineering and network mgmt
• OpenFlow switches are relatively simple
• Accelerate innovation:
– VL2, PortLand: new architecture, virtualized addr
– Hedera: flow scheduling
– ElsticTree: energy-proportional networking
• However, no further est. of overheads
5
6. Bottlenecks
• Root: Excessively couples..
– central control and complete visibility
• Controller bottleneck: scale by dist. sys.
• Switch bottleneck:
– Data- to control-plane: limited BW
– Enormous flow tables, too many entries
– Control and stat.s pkts compete for BW
– Introduce extra delays and latencies
• Switch bottleneck was not well studied
6
7. Dilemma
• Control dilemma:
– Role of controller: visibility and mgmt capability
however, per-flow setup too costly
– Flow-match wildcard, hash-based:
much less load, but no effective control
• Statistics-gathering dilemma:
– Pull-based mechanism: counters of all flows
full visibility but demand high BW
– Wildcard counter aggregation: much less entries
but lose trace of elephant flows
• Aim to strike in between 7
8. Main Concept of DevoFlow
• Devolving most flow controls to switches
• Maintain partial visibility
• Keep trace of significant flows
• Default v.s. special actions:
– Security-sensitive flows: categorically inspect
– Normal flows: may evolve or cover other flows
become security-sensitive or significant
– Significant flows: special attention
• Collect stat.s by sampling, triggering, and
approximating
8
9. Design Principles of DevoFlow
• Try to stay in data-plane, by default
• Provide enough visibility:
– Esp. for significant flows & sec-sensitive flows
– Otherwise, aggregate or approximate stat.s
• Maintain simplicity of switches
9
10. Agenda
• OF Benefits, Bottlenecks, and Dilemmas
• Evaluation of Overheads
• DevoFlow
• Simulation Results
10
11. Overheads: Control PKTs
A N-switch path
For a path with N switches: N+1 control pkts
• First flow pkt to controller
• N control messages to N switches
Average length of a flow in 1997: 20 pkts
In clos / fat-tree DCN topo: 5 switches
6 control pkts per flow
The smaller the flow, the higher cost of BW
11
12. Overheads: Flow Setup
• Switch w/ finite BW between data / control
plane, i.e. overheads between ASIC and CPU
• Setup capability: 275~300 flows/sec
• Similar with [30]
• In data center: mean interarrival 30 ms
• Rack w/ 40 servers 1300 flows/sec
• In whole data center
[43] R. Sherwood, G. Gibb, K.-K. Yap, G. Appenzeller, M. Casado,
N. McKeown, and G. Parulkar. Can the production network be
the testbed? In OSDI , 2010. 12
17. Overheads: Gathering Stat.s
• [30] most longest-lived flows: only a few sec
• Counters: (pkts, bytes, duration)
• Push-based: to controller when flow ends
• Pull-based: fetch actively by controller
• 88F bytes for F flows
• In 5406zl switch:
Entries:1.5K wildcard match/13K exact match
total 1.3 MB, 2 fetches/sec, 17 Mbps
Not fast enough! Consumes a lot of BW!
[30] S. Kandula, S. Sengupta, A. Greenberg, and P. Patel. The
Nature of Datacenter Trac: Measurements & Analysis. In
Proc. IMC , 2009. 17
18. Overheads: Gathering Stat.s
2.5 sec to pull 13K entries
1 sec to pull 5,600 entries
0.5 sec to pull 3,200 entries
18
19. Overheads: Gathering Stat.s
• Per-flow setup generates too many entries
• More the controller fetch longer
• Longer to fetch longer the control loop
• In Hedera: control loop 5 secs
BUT workload too ideal, Pareto distribution
• Workload in VL2, 5 sec only improves 1~5%
over ECMP
• [41], must be less than 0.5 sec to be better
[41] C. Raiciu, C. Pluntke, S. Barre, A. Greenhalgh, D. Wischik,
and M. Handley. Data center networking with multipath TCP.
In HotNets , 2010. 19
20. Overheads: Competition
• Flow setups and stat-pulling compete for BW
• Must need timely stat.s for scheduling
• Switch flow entries
– OpenFlow: TCAMs, wildcard, consumes lots of
power & space
– Rules: 10 header fields, 288 bits each
– Only 60 bits for trad. Ethernet
• Per-flow entry v.s. per-host entry
20
22. Agenda
• OF Benefits, Bottlenecks, and Dilemmas
• Evaluation of Overheads
• DevoFlow
• Simulation Results
22
23. Mechanisms
• Control
– Rule cloning
– Local actions
• Statistics-gathering
– Sampling
– Triggers and reports
– Approximate counters
• Flow scheduler: like Hedera
• Multipath routing: based on probability dist.
enable oblivious routing
23
24. Rule Cloning
• ASIC clones a wildcard rule as an exact
match rule for new microflows
• Timeout or output port by probability
24
25. Rule Cloning
• ASIC clones a wildcard rule as an exact
match rule for new microflows
• Timeout or output port by probability
25
26. Rule Cloning
• ASIC clones a wildcard rule as an exact
match rule for new microflows
• Timeout or output port by probability
26
27. Local Actions
• Rapid re-routing: fallback paths predefined
Recover almost immediately
• Multipath support: based on probability dist.
Adjusted by link capacity or loads
27
28. Statistics-Gathering
• Sampling
– Pkts headers send to controller with1/1000 prob.
• Triggers and reports
– Set a threshold per rule
– When exceeds, enable flow setup at controller
• Approximate counters
– Maintain list of top-k largest flows
28
29. Implementation
• Not yet on hardware
• Engineers support this by using existing
functional blocks for most mechanisms
• Provide some basic tools for SDN
• However, scaling remains a problem
What threshold? How to sample? Rate?
• Default multipath on switches
• Controller samples or sets triggers to detect
elephants, schedules by bin-packing algo.
29
30. Simulation
• How much flow scheduling overheads can be
reduced, while achieving high performance?
• Custom built flow-level simulator, based on
5406zl experiments
• Workloads generated:
– Reverse-engineered [30], by MSR, 1500-server
– MapReduce shuffle stage, 128MB to each other
– Combine these two
[30] S. Kandula, S. Sengupta, A. Greenberg, and P. Patel. The
Nature of Datacenter Trac: Measurements & Analysis. In
Proc. IMC , 2009. 30
42. Conclusion
• Per-flow control imposes too many overheads
• Balance between
– Overheads and network visibility
– Effective traffic engineering / network mgmt
Could lead to various researches
• Switches w/ limited resources
– Flow entries / control-plane BW
– Hardware capability / power consumption
42