SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
Handling BGP Attribute Errors
                      Rob Shakir (GX Networks)
                      rjs@eng.gxn.net / RJS-RIPE




Monday, 18 May 2009                                1
Outline / Motivation


                      • BGP Errors - Current Handling
                      • AS4_PATH Bug and Optional Transitives
                       • Update to RFC 4893
                      • IETF IDR Drafts
                      • Why you should care!
Monday, 18 May 2009                                             2
Attributes and Errors
                      • Types of BGP Attributes
                       • Well-known Mandatory
                       • Well-known Discretionary
                       • Optional Transitive
                       • Optional Non-Transitive
                      • RFC 4271
                       •   “A NOTIFICATION message is sent when an error condition is
                           detected. The BGP connection is closed immediately after it
                           is sent.”

Monday, 18 May 2009                                                                      3
Current Error Handling (1)
                      • AS_PATH Error (Well-known Mandatory)
                       • Worst case - loops and invalid routing.
                      AS65300                               AS65400
                                          eBGP


                                     Invalid AS_PATH



                                NOTIFICATION and Teardown

Monday, 18 May 2009                                                   4
Current Error Handling (2)
                      • Aggregator Error (Optional Transitive)
                       • Worst case? Loss of routing metadata?
                      AS65300                               AS65300
                                          iBGP


                                   Invalid AGGREGATOR



                                NOTIFICATION and Teardown

Monday, 18 May 2009                                                   5
Problem?

                      • All errors are treated equally.
                      • Is this the right behaviour?
                       • “Good, we’re being cautious!”
                       • “Why is my AS suddenly disconnected
                          from the global table?”



Monday, 18 May 2009                                            6
AS4_PATH

                      • Defined in RFC 4893 (Optional Transitive)
        AS70000                        AS65400                 AS71000
                              eBGP                     eBGP

      AS4 Speaker                    Non-AS4 Speaker          AS4 Speaker

   AS4_PATH:
        Not Used                         70000                  Not Used

    AS_PATH:
                 i                       23456 i              65400 70000 i
Monday, 18 May 2009                                                           7
Neat! And Errors?

                      •   Shouldn’t really see errors!

                          •   Cleaned like AS_PATH

                      •   Mixed NEW and OLD confederations

                          •   “To prevent the possible propagation of
                              confederation path segment outside of a
                              confederation, the path segment types
                              AS_CONFED_SEQUENCE and AS_CONFED_SET
                              [RFC3065] are declared invalid for the AS4_PATH
                              attribute” (RFC 4893)

Monday, 18 May 2009                                                             8
Whoops!

                      • December 10th 2008
                          •   91.207.218.0/23

                          •   AS4_PATH: (65044 65057) 196629 (7 bytes)

                          •   AS_PATH: xx xx 35320 23456 (13 bytes)

                      •   Confederation information in AS4_PATH

                          •   First RFC-compliant NEW speaker to see the
                              UPDATE tears down the session to where it saw
                              the UPDATE from.

Monday, 18 May 2009                                                           9
What went wrong?

                      • ASN running mixed confeds with mixed
                        OLD/NEW speakers and JunOS.


       AS65301                         AS65302                    AS65303
                              eBGP                        eBGP

      AS4 Speaker                    Non-AS4 Speaker              AS4 Speaker
                                                                 Invalid AS4_PATH
         Copies
                                      Transits AS4_PATH           received - sends
   AS_CONFED_SET
                                        (not checked!)            NOTIFICATION
    into AS4_PATH
                                                                   and teardown



Monday, 18 May 2009                                                                  10
Why is this concerning?

                                   Global
           AS35320                                AS3356                 AS5413
                                   Table                          eBGP
    AS running JunOS                                                  Arbitrary ASN
                                               Transit Provider
      and Confeds                                                    AS4-aware Border

                      •   First RFC compliant AS4 speaker in the path
                          reacts.

                      •   Teardown can be towards transit (likely, every
                          prefix on these sessions!)

                      •   Can craft an UPDATE to reach via a specific path.
Monday, 18 May 2009                                                                     11
Our Recommended Fix

                      • Recommended: Don’t send
                        NOTIFICATION, treat UPDATE as
                        withdrawl of prefix via this path.
                       • “Punish” broken paths without breaking
                         every prefix via a session.
                       • Prefix might become unreachable.

Monday, 18 May 2009                                               12
Likely RFC Fix

                      • draft-ietf-rfc4893bis
                      • Ignore the broken parts of the AS4_PATH.
                       • IOS implemented this -12.0(32)S(Y8|13)
                       • Doesn’t lose reachability, and recovers
                          from an error “in the wild”
                       • Some implications in loop detection?
Monday, 18 May 2009                                                13
AS_PATH and AS4_PATH

                      • Last LINX meeting - AS_PATH length
                        problems.
                        • Different Case: Well Known Mandatory
                      • Highlights interesting point relating to
                        AS4_PATH - loop detection for AS4?
                      • Bugs will always mean that invalid
                        information is propagated.


Monday, 18 May 2009                                                14
The General Case

                      •   draft-ietf-rfc4893bis fixes this specific - but what
                          about others?

                      •   Errors in other optional transitives still cause
                          session teardown.

                          •   Revise this behaviour? Don’t require
                              NOTIFICATION be sent.

                      •   Tell our neighbour that someone in their path did
                          something wrong?



Monday, 18 May 2009                                                            15
draft-scudder-idr-optional-transitive

                      • Handles the case of Optional Transitives
                        that are not formed or checked by our
                        neighbour
                      • Partial bit is set to 1 if some BGP speaker
                        passes without checking.
                        • These are the “tunneled” UPDATES
                      • Recommended behaviour: Treat as a
                        withdraw of the prefix and log.

Monday, 18 May 2009                                                   16
draft-scholl-idr-advisory

                      •   New MP-BGP capability (ADVISORY)- allows a string
                          to be transmitted between two routers.

                      •   NOT a replacement for NOTIFICATION

                      •   Inform our neighbour that we’re considering an
                          UPDATE as invalid.

                      •   Not just error handling:

                          •   “in-band” notification (e.g. maintenance)



Monday, 18 May 2009                                                           17
draft-nalawade-bgp-soft-notify
                      • Has been some opposition to ADVISORY
                       • Humans already have phone and e-mail!
                      • SOFT-NOTIFICATION previous suggestion
                        (2003)
                       • Intended to allow for graceful recovery
                          from an error.
                      • Structured payload (no IM via BGP!)
Monday, 18 May 2009                                                18
Implications of these Drafts

                      • Protocol-wise, this isn’t core functionality
                       • Vendors and protocol-purists not
                          necessarily interested?
                      • Operationally, we need to be robust!
                       • Do we trust everyone in the global table?
                      • Easier direct communication of events and
                        settings directly between operators.
                        • Capability (you can turn it off!)
Monday, 18 May 2009                                                    19
Conclusions

                      • Blanket handling of BGP errors is
                        suboptimal.
                      • Fix handling optional transitive errors
                        (make the protocol more robust!)
                      • Add method to communicate these errors
                        without tearing sessions down.
                      • Operator’s voices are really needed here!
Monday, 18 May 2009                                                 20
Questions, Comments,
                          Corrections?
                        Many thanks to:
                        Andy Davidson (NetSumo)
                        Jonathan Oddy (Hostway)
                        David Freedman (Claranet)
                        Will Hargrave (LONAP)
                        Greg Hankins (Force10)



Monday, 18 May 2009                                 21
Questions, or comments later?
                              rjs@eng.gxn.net
                                  RJS-RIPE

                                  Public Comments?
                                 IETF IDR - idr@ietf.org
                      (To Subscribe: idr-request@ietf.org, In Body: subscribe idr-post)




Monday, 18 May 2009                                                                       22

Contenu connexe

Plus de Rob Shakir

Reinforcing the Kitchen Sink - Aligning BGP-4 Error Handling with Modern Netw...
Reinforcing the Kitchen Sink - Aligning BGP-4 Error Handling with Modern Netw...Reinforcing the Kitchen Sink - Aligning BGP-4 Error Handling with Modern Netw...
Reinforcing the Kitchen Sink - Aligning BGP-4 Error Handling with Modern Netw...Rob Shakir
 
IETF80 - IDR/GROW BGP Error Handling Requirements
IETF80 - IDR/GROW BGP Error Handling RequirementsIETF80 - IDR/GROW BGP Error Handling Requirements
IETF80 - IDR/GROW BGP Error Handling RequirementsRob Shakir
 
BGP Error Handling (NANOG 51)
BGP Error Handling (NANOG 51)BGP Error Handling (NANOG 51)
BGP Error Handling (NANOG 51)Rob Shakir
 
BGP Error Handling - Developing an Operator-Led Approach in the IETF (UKNOF 18)
BGP Error Handling - Developing an Operator-Led Approach in the IETF (UKNOF 18)BGP Error Handling - Developing an Operator-Led Approach in the IETF (UKNOF 18)
BGP Error Handling - Developing an Operator-Led Approach in the IETF (UKNOF 18)Rob Shakir
 
100GE in the Lab - LINX 71
100GE in the Lab - LINX 71100GE in the Lab - LINX 71
100GE in the Lab - LINX 71Rob Shakir
 
UKNOF16 - Enhancing BGP
UKNOF16 - Enhancing BGPUKNOF16 - Enhancing BGP
UKNOF16 - Enhancing BGPRob Shakir
 

Plus de Rob Shakir (6)

Reinforcing the Kitchen Sink - Aligning BGP-4 Error Handling with Modern Netw...
Reinforcing the Kitchen Sink - Aligning BGP-4 Error Handling with Modern Netw...Reinforcing the Kitchen Sink - Aligning BGP-4 Error Handling with Modern Netw...
Reinforcing the Kitchen Sink - Aligning BGP-4 Error Handling with Modern Netw...
 
IETF80 - IDR/GROW BGP Error Handling Requirements
IETF80 - IDR/GROW BGP Error Handling RequirementsIETF80 - IDR/GROW BGP Error Handling Requirements
IETF80 - IDR/GROW BGP Error Handling Requirements
 
BGP Error Handling (NANOG 51)
BGP Error Handling (NANOG 51)BGP Error Handling (NANOG 51)
BGP Error Handling (NANOG 51)
 
BGP Error Handling - Developing an Operator-Led Approach in the IETF (UKNOF 18)
BGP Error Handling - Developing an Operator-Led Approach in the IETF (UKNOF 18)BGP Error Handling - Developing an Operator-Led Approach in the IETF (UKNOF 18)
BGP Error Handling - Developing an Operator-Led Approach in the IETF (UKNOF 18)
 
100GE in the Lab - LINX 71
100GE in the Lab - LINX 71100GE in the Lab - LINX 71
100GE in the Lab - LINX 71
 
UKNOF16 - Enhancing BGP
UKNOF16 - Enhancing BGPUKNOF16 - Enhancing BGP
UKNOF16 - Enhancing BGP
 

LINX65 - Handling BGP Attribute Errors (Rob Shakir)

  • 1. Handling BGP Attribute Errors Rob Shakir (GX Networks) rjs@eng.gxn.net / RJS-RIPE Monday, 18 May 2009 1
  • 2. Outline / Motivation • BGP Errors - Current Handling • AS4_PATH Bug and Optional Transitives • Update to RFC 4893 • IETF IDR Drafts • Why you should care! Monday, 18 May 2009 2
  • 3. Attributes and Errors • Types of BGP Attributes • Well-known Mandatory • Well-known Discretionary • Optional Transitive • Optional Non-Transitive • RFC 4271 • “A NOTIFICATION message is sent when an error condition is detected. The BGP connection is closed immediately after it is sent.” Monday, 18 May 2009 3
  • 4. Current Error Handling (1) • AS_PATH Error (Well-known Mandatory) • Worst case - loops and invalid routing. AS65300 AS65400 eBGP Invalid AS_PATH NOTIFICATION and Teardown Monday, 18 May 2009 4
  • 5. Current Error Handling (2) • Aggregator Error (Optional Transitive) • Worst case? Loss of routing metadata? AS65300 AS65300 iBGP Invalid AGGREGATOR NOTIFICATION and Teardown Monday, 18 May 2009 5
  • 6. Problem? • All errors are treated equally. • Is this the right behaviour? • “Good, we’re being cautious!” • “Why is my AS suddenly disconnected from the global table?” Monday, 18 May 2009 6
  • 7. AS4_PATH • Defined in RFC 4893 (Optional Transitive) AS70000 AS65400 AS71000 eBGP eBGP AS4 Speaker Non-AS4 Speaker AS4 Speaker AS4_PATH: Not Used 70000 Not Used AS_PATH: i 23456 i 65400 70000 i Monday, 18 May 2009 7
  • 8. Neat! And Errors? • Shouldn’t really see errors! • Cleaned like AS_PATH • Mixed NEW and OLD confederations • “To prevent the possible propagation of confederation path segment outside of a confederation, the path segment types AS_CONFED_SEQUENCE and AS_CONFED_SET [RFC3065] are declared invalid for the AS4_PATH attribute” (RFC 4893) Monday, 18 May 2009 8
  • 9. Whoops! • December 10th 2008 • 91.207.218.0/23 • AS4_PATH: (65044 65057) 196629 (7 bytes) • AS_PATH: xx xx 35320 23456 (13 bytes) • Confederation information in AS4_PATH • First RFC-compliant NEW speaker to see the UPDATE tears down the session to where it saw the UPDATE from. Monday, 18 May 2009 9
  • 10. What went wrong? • ASN running mixed confeds with mixed OLD/NEW speakers and JunOS. AS65301 AS65302 AS65303 eBGP eBGP AS4 Speaker Non-AS4 Speaker AS4 Speaker Invalid AS4_PATH Copies Transits AS4_PATH received - sends AS_CONFED_SET (not checked!) NOTIFICATION into AS4_PATH and teardown Monday, 18 May 2009 10
  • 11. Why is this concerning? Global AS35320 AS3356 AS5413 Table eBGP AS running JunOS Arbitrary ASN Transit Provider and Confeds AS4-aware Border • First RFC compliant AS4 speaker in the path reacts. • Teardown can be towards transit (likely, every prefix on these sessions!) • Can craft an UPDATE to reach via a specific path. Monday, 18 May 2009 11
  • 12. Our Recommended Fix • Recommended: Don’t send NOTIFICATION, treat UPDATE as withdrawl of prefix via this path. • “Punish” broken paths without breaking every prefix via a session. • Prefix might become unreachable. Monday, 18 May 2009 12
  • 13. Likely RFC Fix • draft-ietf-rfc4893bis • Ignore the broken parts of the AS4_PATH. • IOS implemented this -12.0(32)S(Y8|13) • Doesn’t lose reachability, and recovers from an error “in the wild” • Some implications in loop detection? Monday, 18 May 2009 13
  • 14. AS_PATH and AS4_PATH • Last LINX meeting - AS_PATH length problems. • Different Case: Well Known Mandatory • Highlights interesting point relating to AS4_PATH - loop detection for AS4? • Bugs will always mean that invalid information is propagated. Monday, 18 May 2009 14
  • 15. The General Case • draft-ietf-rfc4893bis fixes this specific - but what about others? • Errors in other optional transitives still cause session teardown. • Revise this behaviour? Don’t require NOTIFICATION be sent. • Tell our neighbour that someone in their path did something wrong? Monday, 18 May 2009 15
  • 16. draft-scudder-idr-optional-transitive • Handles the case of Optional Transitives that are not formed or checked by our neighbour • Partial bit is set to 1 if some BGP speaker passes without checking. • These are the “tunneled” UPDATES • Recommended behaviour: Treat as a withdraw of the prefix and log. Monday, 18 May 2009 16
  • 17. draft-scholl-idr-advisory • New MP-BGP capability (ADVISORY)- allows a string to be transmitted between two routers. • NOT a replacement for NOTIFICATION • Inform our neighbour that we’re considering an UPDATE as invalid. • Not just error handling: • “in-band” notification (e.g. maintenance) Monday, 18 May 2009 17
  • 18. draft-nalawade-bgp-soft-notify • Has been some opposition to ADVISORY • Humans already have phone and e-mail! • SOFT-NOTIFICATION previous suggestion (2003) • Intended to allow for graceful recovery from an error. • Structured payload (no IM via BGP!) Monday, 18 May 2009 18
  • 19. Implications of these Drafts • Protocol-wise, this isn’t core functionality • Vendors and protocol-purists not necessarily interested? • Operationally, we need to be robust! • Do we trust everyone in the global table? • Easier direct communication of events and settings directly between operators. • Capability (you can turn it off!) Monday, 18 May 2009 19
  • 20. Conclusions • Blanket handling of BGP errors is suboptimal. • Fix handling optional transitive errors (make the protocol more robust!) • Add method to communicate these errors without tearing sessions down. • Operator’s voices are really needed here! Monday, 18 May 2009 20
  • 21. Questions, Comments, Corrections? Many thanks to: Andy Davidson (NetSumo) Jonathan Oddy (Hostway) David Freedman (Claranet) Will Hargrave (LONAP) Greg Hankins (Force10) Monday, 18 May 2009 21
  • 22. Questions, or comments later? rjs@eng.gxn.net RJS-RIPE Public Comments? IETF IDR - idr@ietf.org (To Subscribe: idr-request@ietf.org, In Body: subscribe idr-post) Monday, 18 May 2009 22