Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Solving Congestion Problems in Storage Area Networks

5 912 vues

Publié le

Slow Drain is the highest point of concern for storage administrators. Still, it is one of the least understood problems. Watch this slideshare to understand the basic concept and f features provided by Cisco MDS switches for detection, troubleshooting and automatic recovery. Discover best practices for eliminating slow drain and other Storage Area Network (SAN) congestion problems. See the recent innovations from Cisco that enable you to quickly detect, resolve, and even prevent SAN congestion. Deploy these features today to have your own self-healing fabric.

Publié dans : Technologie

Les commentaires sont fermés

  • Soyez le premier à commenter

Solving Congestion Problems in Storage Area Networks

  1. 1. in Storage Area Networks (SAN) Solving congestion problems Paresh Gupta, Technical Marketing Engineer Ed Mazurek, Technical Leader, Services Feb, 2015
  2. 2. numbers of Apps in 2014 over last year The worlds that we live in
  3. 3. data growth by 2020 The worlds that we live in
  4. 4. 2x numbers of Apps in 2014 10x data growth by 2020 1.5x IT professionals The worlds that we live in
  5. 5. Highlight: HW Enhanced Slow Drain Detection Troubleshooting Automatic Recovery Immediate1 ms1 ms
  6. 6. Understanding Slow Drain
  7. 7. • B2B credits are not negotiated – just agreed to • Each side informs the other side of the number of buffer credits it has Fibre Channel Flow Control: B2B Credits I have 1 RX B2B credit FN OK. I have 3 B2B credits B B B B Fibre Channel Switch F-Port has three credits! Storage disk N-port has one credit!
  8. 8. • MDS Rx buffer queue is decremented by 1 B2B credit for each received frame • R_RDY is sent to sender when buffer occupying frame is handled • For each frame sent, R_RDY (B2B Credit) should be returned • R_RDYs are not sent reliably – they can be corrupted/lost Fibre Channel Flow Control: Traffic Flow Storage disk FN B B B B Frame1 R_RDY B Fibre Channel Switch Frame2 Frame3
  9. 9. • Disk 1 sends frame to Server 1 • Switch 1 sends R_RDY after it transmits the frame to switch 2 • Switch 2 sends R_RDY after it transmits the frame to Server 1 • Server 1 sends R_RDY after frame is consumed by HBA Lossless Fibre Channel fabric Disk 1 B B B B B B B B B BB B B B B B B B Frame Server 1 Switch 1 Switch 2 Frame R_RDYR_RDYR_RDY Frame
  10. 10. • Server 1 cannot process frames  does not return R_RDY • No available B2B credits on port connected to Server 1 and Disk 1 • No available B2B credits on ISL Ports • Disk 1 stops transmitting  fabric becomes lossless Lossless Fibre Channel fabric Disk 1 B B B B B B B B B BB B B B B B B B Frame Server 1 Switch 1 Switch 2 Frame Frame Frame FrameFrame Frame Frame Frame Frame Frame Frame Frame Frame Frame Frame Frame Frame R_RDYBackPressureR_RDYBackPressureR_RDY
  11. 11. • B2B credits exhausted on ISL • No R_RDY sent to Disk 1 as well as Disk 2 • Effect of ‘slow server 1’ on Flow Disk2-Server2 Slow Drain situation Disk 1 B B B B B B B B B BB B B B B B B B Frame Server 1 Switch 1 Switch 2 Frame Frame Frame FrameFrame Frame Frame Frame Frame Frame Frame Frame Frame Frame Frame Frame Frame Disk 2 B B B B B B B B B B Server 2 R_RDYBackPressureR_RDYBackPressureR_RDY Frame Frame Frame Frame Frame BackPressure R_RDY
  12. 12. • One slow device impacts all other devices sharing same switches and ISL Slow Drain situation Disk 1 B B B B B B B B B BB B B B B B B B Frame Server 1 Switch 1 Switch 2 Frame Frame Frame FrameFrame Frame Frame Frame Frame Frame Frame Frame Frame Frame Frame Frame Frame Disk 2 B B B B B B B B B B Server 2 R_RDYBackPressureR_RDYBackPressureR_RDY Frame Frame Frame Frame Frame BackPressure R_RDY Slow Node Impacted NodesImpacted Node
  13. 13. • Edge devices • Server performance problems: application or OS • Host bus adapter (HBA) problems: driver or physical failure • Speed mismatches: one fast device and one slow device • Non-graceful virtual machine exit on a virtualized server, resulting in packets held in HBA buffers • Storage subsystem performance problems, including overload • Inter Switch Links (ISL) • Lack of B2B credits for the distance the ISL is traversing • Ex: 4 credits per KM @ 8Gbps • The existence of slow drain edge devices • Edge devices with faster speeds than ISLs even when port-channeled Reasons for Slow Drain
  14. 14. Cisco MDS Architecture
  15. 15. Line Card 2Line Card 1 Active Supervisor Arbiter Fabric Module(XBAR) Fabric Module(XBAR) XBAR interface VOQ P o r t P o r t Frame & credit processing in MDS switch Cisco MDS Initiator sends FC frame1 MDS receives frame in its entirely and stored 2 Frame transmitted to VOQ3 XBAR interface requests Arbiter for grant to transmit frame to egress port via XBAR 4 Arbiter grants request to XBAR interface to forward frame – only sent when egress port has buffer space available 5 FC Frame is forwarded to XBAR then R_RDY sent back since buffer is now free 6 FC Frame is forwarded to egress line card7 ASIC forwards frame to target8 Credit is returned to Arbiter9 Req Grant Frame R_RDY Frame Frame Frame credit
  16. 16. Line Card 2Line Card 1 Active Supervisor Arbiter Fabric Module(XBAR) Fabric Module(XBAR) XBAR interface VOQ P o r t P o r t Cisco MDS architecture advantage Cisco MDS Throughput & Latency Consistent performance at different traffic loads & type Predictable by CRC checking at all stages Drops corrupt frame non-blocking arbitrated crossbar architecture Never drops good frame Under Congestion
  17. 17. Slow Drain Detection
  18. 18. • Credits unavailable on port for extended duration • Traffic does not flow at all • Separate counter is maintained for stuck ports • Credits returned Slowly • Traffic does not flow at line rate • Counter is maintained if credits unavailable for 100 ms Slow & Stuck Port B B B B B Frame MDS Frame Frame Frame Frame Frame Frame Frame Frame Slow Port Stuck Port R_RDY
  19. 19. Slow Drain Troubleshooting
  20. 20. • Credits: Number agreed initially • Remaining credit: Dynamic counter, these many frames can still be sent • Increments counter whenever port hits zero credits • Maintained as Hardware statistic Credit availability B B B B B Frame MDS Frame Frame Frame Frame Frame Frame Frame Frame Credit transition to zero Credit and remaining credit R_RDY
  21. 21. • Real time display of frames in ingress queues • Display key info of the frame dropped due to timeout • Like Source FCID (SID), Destination FCID(DID), etc Frame information B B B B B Frame MDS Frame Frame Frame Frame Frame Frame Frame Frame Display dropped frame info Display frames in ingress queue R_RDY
  22. 22. • Each LineCard logs events to an NVRAM buffer • Events are timestamped On Board Failure Logging (OBFL) MDS Line Card 1 Line Card 2 OBFL - NVRAM Error-stats Flow-control Timeouts Request-timeouts OBFL - NVRAM Error-stats Flow-control Timeouts Request-timeouts
  23. 23. DCNM Enhancements
  24. 24. DCNM Slow Drain Enhancements Automates Troubleshooting • Collects the whole fabric at once Automates Collection • From hours of collection to minutes Reduces False Positives • prioritizing ports highest severity counters Shows Fluctuations in counters • Graphs counters • Enables user to zero in on specific counters
  25. 25. Slow Drain Automatic Recovery
  26. 26. • On receiving LR, checks if input buffers are empty • If input buffers are not empty in 90ms the “LR Rcvd B2B” condition occurs & the link fails with reason “Link failure Link Reset failed nonempty Recv queue” • Indication of upstream congestion • Credits unavailable • F Port : 1 second • E Port : 1.5 second • Transmits Link Reset (LR) • If Link Reset Response(LRR) is received, replenish credits • If not received, Port failure • Increment Counter Stuck Port Recovery B B B B B B B B B B B B B B MDS1 MDS2 Frame Frame FrameFrame Frame Frame Frame Frame Frame Frame Frame Frame Frame Frame B B B B B BR_RDYFrame Frame Frame Receiving PortTransmitting Port
  27. 27. Congestion Drop • MDS timestamps each received frame • Frame is dropped if cannot be delivered to the egress port within timeout • Logging is done • Can be configured 100ms-500ms (500ms default) • Lowering will timeout frames quicker and reduce effects of slow drain devices B B B B B B B MDS Frame Frame Frame Frame Frame Frame Frame B B BFrame Frame Frame
  28. 28. no-credit-drop Disk 1 B B B B B B B B B BB B B B B B B B Frame Server 1 MDS 1 MDS 2 Frame Frame Frame FrameFrame Frame Frame Frame Frame Frame Frame Frame Frame Frame Frame Frame Frame Disk 2 B B B B B B B B B B Server 2 R_RDYR_RDY Frame Frame Frame Frame Frame R_RDY Drop frames from egress queue of Slow Port BackPressure Released BackPressure Released BackPressure Released • Frames dropped in egress queue if credits unavailable for no-credit-drop timeout
  29. 29. HW Enhancements
  30. 30. HW + SW Slow Drain Support (6.2(9) onwards) MDS 9148S MDS 9250i Supported Software and Hardware 48 Port 16G FC Line Card (DS-X9448-768K9) MDS 9710 MDS 9706 SW (Only) Slow Drain Support MDS9222i MDS9148 MDS9513 MDS9509 MDS9506 32/48 Port 8G FC Line Card
  31. 31. HW Assistance Explained Control Plane Data Plane 100 ms Polling Software Based Detection Action HW Assistance Action Detection
  32. 32. no-credit-drop : HW Assistance  Detection Range: 1-500 ms instead of 100-500 ms  Devices slower than 100ms handled  Reduced traffic drop at high speeds  Granularity: Reduced from 100 to 1ms  Enhanced precision  Any value from 1 to 500 ms, (earlier: 100, 200, etc. Now: 101, 102, etc ms)  No missed transient conditions!  no-credit-drop Action: Immediate (ns)  Up to 99ms of early Action!  Recovery from no-credit-drop condition: Immediate (ns)  Up to 99ms of early Recovery! at least 60% incremental performance
  33. 33. Slow Port Monitoring  Shows real time delay of R_RDY  Monitoring done at 1ms Mds9706# show process creditmon slowport-monitor-events Module: 01 Slowport Detected: YES ===================================================================== Interface = fc1/18 ------------------------------------------------------------ | admin | slowport | oper | Timestamp | delay | detection | delay | | (ms) | count | (ms) | ------------------------------------------------------------ | 1 | 0 | 9 | Wed Jul 2 19:47:35.038 2014 | 1 | 128 | 9 | Wed Jul 2 19:47:19.922 2014 | 1 | 127 | 4 | Wed Jul 2 19:47:19.618 2014 | 1 | 119 | 10 | Wed Jul 2 19:47:19.518 2014 | 1 | 109 | 10 | Wed Jul 2 19:47:19.418 2014 | 1 | 101 | 10 | Wed Jul 2 19:47:19.318 2014 | 1 | 100 | 4 | Wed Jul 2 19:47:19.118 2014 | 1 | 93 | 10 | Wed Jul 2 19:47:19.017 2014 | 1 | 83 | 10 | Wed Jul 2 19:47:18.917 2014 | 1 | 74 | 12 | Wed Jul 2 19:47:18.818 2014 Configured Delay via slow-port-monitor Number of times the delay was detected. Actual Delay seen by the port Timestamp of last 10 times when the delay was observed  Done in Hardware. No overhead on CPU  Recommendation: Always Turn it on!
  34. 34. Cisco recommends troubleshooting slow drain in the following order Methodology 34 Level 3: Extreme Delay Level 2: Retransmission Level 1: Latency Troubleshooting Slow Drain
  35. 35. • If Rx congestion then find ports communicating with this port that have Tx congestion • Zoning defines which devices communicate with this port • Understand topology • If port communicating with port showing Rx congestion is FCIP • Check for TCP retransmits • Check for overutilization of FCIP 35 F E Rx Credits 0 Remaining Tx Credits 0 Remaining Congestion Methodology – Follow Congestion to Source Troubleshooting Slow Drain
  36. 36. • If Tx congestion found • If F port then device attached is slow drain device, if not; • If E port then go to adjacent switch and continue troubleshooting • Continue to track through the fabric until destination F-port is discovered 36 E EF F Rx Credits 0 Remaining Tx Credits 0 Remaining Congestion Methodology – Follow Congestion to Source Troubleshooting Slow Drain
  37. 37. Port monitoring MDS Event • Generate Alarms • Flap Port • Error disable port
  38. 38. • Port-monitor sends SNMP alerts and also take portguard action • Adding portguard to errdisable or flap a port can help the switch automatically recover problems • Should be done on access(F) ports only • Use separate access(F) and trunk(E) policies • Warning: Currently access (F) ports include F port-channels and trunks. Consequently, portguard actions should be avoided on these switches. • System timeout congestion-drop and no-credit-drop should also be considered 38 Port-monitor portguard Port-monitor Alerting and Action
  39. 39. 39 port-monitor name AllPorts port-type all no monitor counter link-loss no monitor counter sync-loss no monitor counter signal-loss no monitor counter invalid-words no monitor counter invalid-crc counter tx-discards poll-interval 60 delta rising-threshold 50 event 4 falling-threshold 10 event 4 counter lr-rx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4 counter lr-tx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4 counter timeout-discards poll-interval 60 delta rising-threshold 50 event 4 falling-threshold 10 event 4 counter credit-loss-reco poll-interval 60 delta rising-threshold 1 event 4 falling-threshold 0 event 4 counter tx-credit-not-available poll-interval 1 delta rising-threshold 10 event 4 falling-threshold 0 event 4 no monitor counter rx-datarate no monitor counter tx-datarate no monitor counter err-pkt-from-port no monitor counter err-pkt-to-xbar no monitor counter err-pkt-from-xbar Policy applies to Access(F) and Trunk(E) ports These counters are not monitored Note: The above monitors 6 slow drain counters and does not monitor 10 others Port-monitor alerting – Sample all ports policy Port-monitor
  40. 40. 9513(config)# port-monitor activate AllPorts 9513(config)# show port-monitor active Policy Name : AllPorts Admin status : Active Oper status : Active Port type : All Ports --------------------------------------------------------------------------------------------------------- Counter Threshold Interval Rising Threshold event Falling Threshold event PMON Portguard ------- --------- -------- ---------------- ----- ------------------ ----- -------------- TX Discards Delta 60 50 4 10 4 Not enabled LR RX Delta 60 5 4 1 4 Not enabled LR TX Delta 60 5 4 1 4 Not enabled Timeout Discards Delta 60 50 4 10 4 Not enabled Credit Loss Reco Delta 60 1 4 0 4 Not enabled TX Credit Not Available Delta 1 10 4 0 4 Not enabled ---------------------------------------------------------------------------------------------------------- 40 All Ports port policy Port-monitor alerting – Sample all ports policy Port-monitor
  41. 41. • The following shows portguard to timeout-discards and credit-loss-reco and adjusts the rising-threshold up a bit: port-monitor name AccessPorts port-type access no monitor counter link-loss no monitor counter sync-loss no monitor counter signal-loss no monitor counter invalid-words no monitor counter invalid-crc counter tx-discards poll-interval 60 delta rising-threshold 50 event 4 falling-threshold 10 event 4 counter lr-rx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4 counter lr-tx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4 counter timeout-discards poll-interval 60 delta rising-threshold 60 event 4 falling-threshold 10 event 4 portguard errordisable counter credit-loss-reco poll-interval 60 delta rising-threshold 4 event 4 falling-threshold 0 event 4 portguard errordisable counter tx-credit-not-available poll-interval 1 delta rising-threshold 10 event 4 falling-threshold 0 event 4 no monitor counter rx-datarate no monitor counter tx-datarate no monitor counter err-pkt-from-port no monitor counter err-pkt-to-xbar no monitor counter err-pkt-from-xbar 41 Error disable the port when 60 timeout-discards happen in 60 seconds Error disable the port when 4 credit loss recovery events occur in 60 seconds Access(F) port policy Port-monitor portguard – Sample access (F) port policy Port-monitor
  42. 42. 42 port-monitor name ISLPorts port-type trunks no monitor counter link-loss no monitor counter sync-loss no monitor counter signal-loss no monitor counter invalid-words no monitor counter invalid-crc counter tx-discards poll-interval 60 delta rising-threshold 100 event 4 falling-threshold 10 event 4 counter lr-rx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4 counter lr-tx poll-interval 60 delta rising-threshold 5 event 4 falling-threshold 1 event 4 counter timeout-discards poll-interval 60 delta rising-threshold 100 event 4 falling-threshold 10 event 4 counter credit-loss-reco poll-interval 60 delta rising-threshold 1 event 4 falling-threshold 0 event 4 counter tx-credit-not-available poll-interval 1 delta rising-threshold 10 event 4 falling-threshold 0 event 4 no monitor counter rx-datarate no monitor counter tx-datarate no monitor counter err-pkt-from-port no monitor counter err-pkt-to-xbar no monitor counter err-pkt-from-xbar Trunk (E) port policy Port-monitor portguard – Sample trunk (E) port policy Port-monitor
  43. 43. MDS9513# show port-monitor active Policy Name : ISLPorts Admin status : Active Oper status : Active Port type : All Trunk Ports --------------------------------------------------------------------------------------------------------- Counter Threshold Interval Rising Threshold event Falling Threshold event PMON Portguard ------- --------- -------- ---------------- ----- ------------------ ----- -------------- TX Discards Delta 60 100 4 10 4 Not enabled LR RX Delta 60 5 4 1 4 Not enabled LR TX Delta 60 5 4 1 4 Not enabled Timeout Discards Delta 60 100 4 10 4 Not enabled Credit Loss Reco Delta 60 1 4 0 4 Not enabled TX Credit Not Available Delta 1 10 4 0 4 Not enabled ---------------------------------------------------------------------------------------------------------- Policy Name : AccessPorts Admin status : Active Oper status : Active Port type : All Access Ports --------------------------------------------------------------------------------------------------------- Counter Threshold Interval Rising Threshold event Falling Threshold event PMON Portguard ------- --------- -------- ---------------- ----- ------------------ ----- -------------- TX Discards Delta 60 50 4 10 4 Not enabled LR RX Delta 60 5 4 1 4 Not enabled LR TX Delta 60 5 4 1 4 Not enabled Timeout Discards Delta 60 60 4 10 4 Error Disable Credit Loss Reco Delta 60 4 4 0 4 Error Disable TX Credit Not Available Delta 1 10 4 0 4 Not enabled ---------------------------------------------------------------------------------------------------------- 43 Both policies active Port-monitor portguard – both policies when activated Port-monitor
  44. 44. 44 DCNM event log Port-monitor
  45. 45. • Creditmon is a process that runs periodically in each linecard • It checks for transmit credits at zero • F Port at 0 Tx credits for 1 second • E Port at 0 Tx credits for 1.5 seconds • Credit loss recovery invoked • Can occur due to faulty hardware in the connection to the device • Frames dropped due to errors(CRC, etc.) • No credits returned for corrupted frames – this eventually causes repeated credit loss 45 0 sec -- 1/1.5 sec -- No Credits (Stuck) LRR LR +60ms -- credit Port resumes normal operation Stuck Port / Credit Loss Due to Bad Physical Connection Case Study 1
  46. 46. • Counters are polled every 20 seconds • When counter value changes it is included • Several different counters are included in error-stats: • Timeout drops • Credit loss recovery • Tx/Rx credit not available(100ms) • Force timeout on/off mds9710-2# show logging onboard error-stats ---------------------------- Module: 1 ---------------------------- -------------------------------------------------------------------------------- ERROR STATISTICS INFORMATION FOR DEVICE DEVICE: FCMAC -------------------------------------------------------------------------------- Interface | | | Time Stamp Range | Error Stat Counter Name | Count |MM/DD/YY HH:MM:SS | | | -------------------------------------------------------------------------------- fc1/13 |F16_TMM_TOLB_TIMEOUT_DROP_CNT |242618 |04/14/14 12:17:58 fc1/13 |FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO |124 |04/14/14 12:17:58 fc1/13 |FCP_SW_CNTR_CREDIT_LOSS |124 |04/14/14 12:17:58 fc1/13 |F16_TMM_TOLB_TIMEOUT_DROP_CNT |201650 |04/14/14 12:17:38 fc1/13 |FCP_SW_CNTR_TX_WT_AVG_B2B_ZERO |108 |04/14/14 12:17:38 fc1/13 |FCP_SW_CNTR_CREDIT_LOSS |107 |04/14/14 12:17:38 46 Show logging onboard starttime <mm-dd-yy-00:00:00> error-stats Troubleshooting
  47. 47. • Hosts are reporting latency/errors • First notice timeout drops(Tx) occurring on storage edge switch • Use show logging onboard starttime <date-time> error-stats • Follow Tx congestion to core switches • Follow Tx congestion to Host edge switch • Follow Tx congestion to offending host 47 Storage edge Core#1 Core#2 Host edge Timeout drops Host ISLs on edge switch dropping frames Case Study 2
  48. 48. Summary
  49. 49. Line Card 2Line Card 1 Active Supervisor Arbiter Fabric Module(XBAR) Fabric Module(XBAR) XBAR interface VOQ P o r t P o r t Cisco MDS architecture advantage Cisco MDS Throughput & Latency Consistent performance at different traffic loads & type Predictable by CRC checking at all stages Drop corrupt frames non-blocking arbitrated crossbar architecture Never drops good frame Under Congestion
  50. 50. MDS, Nexus & DCNM Slow Drain Advantage Detection Troubleshooting Automatic Recovery Slow Port Stuck Port Slow Port Monitoring Credit transition to zero Credit and remaining credit Info of dropped frames See frames in ingress Q OBFL logging Port Monitoring Virtual Output queues Stuck Port Recovery LR Rcvd B2B Congestion drop No-credit-dropDCNM Fabric wide visibility Automatic collection and graphical display of counters Reduced false positives HW Assisted HW Assisted Detection 1 ms Action Immediate
  51. 51. Key Takeaways Cisco MDS, Nexus & DCNM builds Self Healing Fabrics
  52. 52. Resources
  53. 53. Cisco Live! San Diego June 7 – 11, 2015 BRKSAN-3446 SAN Congestion Understanding, Troubleshooting, Mitigating in a Cisco Fabric by Ed Mazurek
  54. 54. • Understanding Slow Drain: Detection, Troubleshooting & Automatic Recovery: https://www.youtube.com/watch?v=wEz3z6NLaBU&list=PL_ju2fKFbFzVMZgXAHV9kZ6FT93 BuG0eB • White Paper on “Slow Drain Device Detection and Congestion Avoidance” at http://www.cisco.com/c/en/us/products/collateral/storage-networking/mds-9700-series- multilayer-directors/white_paper_c11-729444.html • Cisco Live Session: BRKSAN-3446 by Ed Mazurek on “MDS 9500 9710 Understanding Detecting Troubleshooting Mitigating Slow Drain in a Cisco Fabric” https://www.ciscolive.com/online/connect/sessionDetail.ww?SESSION_ID=78677&backBtn=t rue • Generation 4 Slow Drain Counters commands and troubleshooting: http://www.cisco.com/c/en/us/support/docs/storage-networking/mds-9509-multilayer- director/116098-trouble-gen4-00.html Slow Drain Reference

×