Internet-Draft | EVPN Fast Reroute | March 2024 |
Burdet, et al. | Expires 5 September 2024 | [Page] |
This document summarises EVPN convergence mechanisms and specifies procedures for EVPN networks to achieve fast and scale‑independent convergence.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 5 September 2024.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
EVPN convergence and failure recovery methods from different types of network failures is described in Section 17 of [I-D.ietf-bess-rfc7432bis]. Similarly for EVPN‑VPWS, the end of Section 5 of [RFC8214] briefly evokes an egress link protection mechanism.¶
The fundamentals of EVPN convergence rely on a mass‑withdraw technique of the Ethernet A-D per ES route to unresolve all the associated forwarding paths (Section 9.2.2 of [I-D.ietf-bess-rfc7432bis] 'Route Resolution'). The mass‑withdraw grouping approach results in suitable EVPN convergence at lower scale, but is not sufficient to meet stricter convergence requirements, often sub-second. Other control-plane enhancements such as route‑prioritisation ([I-D.ietf-bess-rfc7432bis]) help further but still provide no guarantees.¶
EVPN convergence using only control-plane approaches is constrained by BGP route propagation delays, routes processing times in software and hardware programming. These are additionally often performed sequentially and linearly given the potential large scale of EVPN routes present in control plane.¶
This document presents a mechanism for fast reroute to minimise packet loss in the case of a link failure using EVPN redirect labels (ERLs) with special forwarding behaviors. Multiple-failures where loops may occur are addressed, as are cascading failures. A mechanism for distributing redirect labels (ERLs) alongside EVPN service labels (ESLs) is shown.¶
The main objective is to achieve fast convergence in EVPN networks without relying on control plane actions. The procedures in this document apply to the following EVPN services: EVPN [I-D.ietf-bess-rfc7432bis], EVPN-VPWS [RFC8214], EVPN Inter-Subnet Forwarding [RFC9135] and EVPN IP-VRF-to-IP-VRF models as in Section 4.4 of [RFC9136]. All the EVPN Multi-Homing modes are included.¶
Some of the terminology in this document is borrowed from [RFC8679] for
consistency across fast reroute frameworks.
The term 'label' when used in this document,
especially when referring to ERL and ESL (below) indicates an MPLS label, a VNI (VXLAN Network
Identifier) or a Segment Routing IPv6 SID, depending on the transport being used.¶
Fast convergence in EVPN networks is achieved using a combined approach to minimising traffic loss:¶
The solution presented in this document addresses the local failure detection and restoration, without impeding on or impacting existing EVPN control plane convergence mechanisms.¶
Consider the following EVPN topology where PE1 and PE2 are multihoming PEs on a shared ES, ESI1. EVPN (known unicast) or EVPN‑VPWS traffic from CE1 to CE2 is sent to PE1 and PE2 using EVPN service labels ESL1 and/or ESL2 (depending on load-balancing mode of the ESI1 interfaces).¶
Alongside the service labels ESL1 and ESL2, two redirect labels ERL1 and ERL2 are allocated with special forwarding behaviors, as detailed in Section 5. Fast-reroute and use of the ERLs is shown in Section 4.2¶
EVPN DF-Election lends itself well to the selection of a pre-computed path amongst any given number of peering PEs by providing a DF‑Elected and BDF‑Elected node at the <EVI, ESI> granularity ([RFC8584] and [I-D.ietf-bess-rfc7432bis]).¶
In All-active mode, all PEs in the Ethernet Segment are actively forwarding known unicast
traffic to the CE. For All-active services where DF‑Election is not strictly required
(EVPN-VPWS) the DF-Election algorithm is run to determine BDF-Elected PE for ERL selection
purposes only, without impacting the service itself.
In Single-active and Port-Active modes, only a single PE in the Ethernet Segment is actively forwarding known unicast
traffic to the CE: the DF-Elected PE. The BDF-Elected PE is next to be
elected in the redundancy group and is already known.
In Single-flow-active mode ([I-D.ietf-bess-evpn-l2gw-proto]), only a single PE in the Ethernet Segment is actively forwarding known
unicast to the CE for a given flow: the PE which initially
received that flow from the Ethernet-Segment. The backup PE is the multihoming peer in the
redundancy group, referred to as "BDF" for consistency with other redudancy modes.¶
For consistency across PEs and load-balancing modes, the backup path selected should be in order of {DF, BDF, NDF1, NDF2, ...}. The DF-Elected PE selects the next-best BDF-Elected as backup and all BDF- and NDF-Elected nodes select the best DF-Elected for the protection of their egress links.¶
The use of PE2's ERL2 as redirect label applies to local failures in all load-balancing modes at PE1.¶
The number of peering PEs is not limited by existing DF-Election algorithms. A solution based on DF-Election supports subsequent redirection upon multiple cascading failures, once a new DF-Election has occurred. Pre-selection of a backup path is supported by all current DF-Election algorithms, and more generally by all algorithms supporting BDF-Election, as recommended in ([I-D.ietf-bess-rfc7432bis]).¶
The procedures for forwarding known unicast packets received from a remote PE on the local redirect label follow Section 13.2.2 of [I-D.ietf-bess-rfc7432bis] for known unicast traffic. Since the CE next-hop forwarding information reflects the current BDF state of the AC, additional steps to bypass blocking state and preventing another re-direction are applied, as described further in this document.¶
Consider the EVPN multihoming topology in Figure 1, and a traffic flow from CE1 to CE2 which is currently using EVPN service label ESL1 and forwarded through the core arriving at PE1. When the local AC representing the <EVI,ESI> pair is protected using the fast-reroute solution, the pre-computed backup path's redirect label (i.e. ERL2 from BDF-Elected PE2) is installed against the AC.¶
Under normal conditions, PE1 disposition using ESL1 will result in forwarding the packet to the CE by selecting the local AC associated with the EVPN service label ([RFC8214], [I-D.ietf-bess-rfc7432bis]). When this local AC is in failed state, the fast-reroute solution at PE1 will begin rerouting packets using the BDF-Elected peer's nexthop and ERL2. ERL2 is chosen for redirected traffic and not ESL2 to prevent loops and overcome DF-Election timing as described in Sections 5.2 and 5.1 respectively.¶
In EVPN multihoming where the CE connects to peering PEs through link aggregation (LAG), a single LAG failure at the CE may manifest as multiple ES failures at all peering PEs simultaneously.¶
As all peering PEs would enable simultaneously the fast-reroute mechanism, redirection would be permanent causing a traffic storm or until TTL expires.¶
Once-redirected traffic may not be redirected again, according to the terminal nature of ERLs described in Section 5.2¶
Trying to support cascading failures by redirecting once-redirected traffic is substantially equivalent to simultaneous failures above.¶
Once-redirected traffic may not be redirected again, according to the terminal nature of ERLs described in Section 5.2 and loss is to be expected until EVPN control plane reconverges for double-failure scenarios.¶
In a scenario with 3 peering PEs (PE1-DF, PE2-BDF, PE3-NDF) where PE1 fails, followed by a PE2 failure before control-plane reconvergence, there is no reroute of traffic towards PE3 because the reroute-label is terminal.¶
In such rapid-succession failures, it is expected that control plane must first correct for the initial failure and DF-Elect PE2 as new‑DF and PE3 as the new‑BDF. PE2 to PE3 redirection would then begin, unless control-plane is rapid enough to correct directly, and elect PE3 new-DF.¶
The EVPN redirect labels MUST be downstream assigned, and it is directly associated with the <EVI,ESI> AC being egress protected. The special forwarding characteristics and use of an EVPN redirect label (ERL) described below, are a matter of local significance only to the advertising PE (which is also the disposition PE).¶
Special behaviors to the ERLs do not affect any other PEs or transit P nodes. There are no extra labels appended to the label stack in the IP/MPLS network and the ERL appears to label-switching transit nodes as would any other EVPN service label. Since they appear as EVPN service labels, ERL labels do not have any impact on Flow-Label or Control-Word procedures in [I-D.ietf-bess-rfc7432bis].¶
Two special forwarding characteristics and behaviors of EVPN redirect labels are described below to mitigate these issues.¶
Local detection and restoration at DF-Elected PE1 will begin rapidly redirecting traffic onto the
backup path selected (PE2).
Redirected packets will arrive at the Backup-DF port much faster than control plane
DF-Election at the Backup-DF peer is capable of unblocking its local egress link for the
shared ES (ESI1). All redirected traffic would drop at Backup-DF and
no net reduction in traffic loss is achieved.¶
Traffic restoration remains dependant upon ES route or Ethernet A-D per ES/EVI routes withdrawal for a DF-Election operation and for PE1 to assume the traffic forwarding role. This is especially important in single-active load-balancing mode where known unicast traffic is blocked.¶
To mitigate this, the redirect labels allocated must carry a special attribute in the local forwarding and decapsulation chain: for traffic received on the ERL when the AC is up, an override to the DF‑Election is applied and traffic from the ERL will bypass the local Backup-DF blocking state. Once EVPN control plane reconverges, traffic from the ERL will cease and the optimal forwarding path based on ESLs will resume.¶
The EVPN redirect label MUST carry a context locally, such that from disposition to
egress redirected packets are allowed to bypass the Backup-DF blocking state that would otherwise
drop. Similarly, this may open the gate to the traffic in the reverse direction.
In Port-Active mode, the Backup-DF interface may signal Out-of-Service but remain in Up/Backup
state: to support EVPN Fast Reroute, the CE must be able to receive traffic from an OOS
LAG link.¶
The reroute scheme is susceptible to loops and persistant redirects between peering PEs which have setup FRR redirection. Consider the scenario where both CE-facing interfaces fail simultaneously, fast reroute will be activated at both PE1 and PE2 effectively bouncing a redirected packet between the two PEs indefinitely (or until the TTL expires) causing a traffic storm.¶
To prevent this, a distinction is made between 'regular' EVPN service labels for disposition (i.e. known unicast EVI label or EVPN-VPWS label) and reroute labels with terminal disposition.¶
At the redirecting PE2, we consider the case of ESL2 vs. ERL2 , where both are locally allocated and provided in EVPN routes (downstream allocation) to BGP peers:¶
EVPN Service label, ESL2:¶
EVPN Reroute label, ERL2:¶
The ERL acts like a local cross-connect by providing a direct channel from disposition to the AC. ERLs are terminal-disposition and prevents once‑redirected packets from being redirected again. With this forwarding attribute on ERLs, known only locally to the downstream-allocating PE, redirection is achieved without growing the label stack with another special purpose label.¶
Fast reroute mechanisms such as the one described in this document generally provide a way to preserve traffic flows at failure time. Use of fast reroute in EVPN, however, permits setting up a controlled recovery sequence to shorten the period of loss between an interface coming up and the EVPN DF-Election procedures and default timers for peer discovery.¶
The benefit of a controlled recovery sequence is amplified when used in conjunction with [I-D.ietf-bess-evpn-fast-df-recovery] (synchronised DF-Election)>¶
The solution is agnostic to transport underlays, for instance similar behavior is carried forward for NVO tunnels (VXLAN) and SRv6.¶
The rerouting procedures and behaviors in this document apply as well for [RFC8365] NVO tunnels.¶
For MPLS-based NVO tunnels, i.e. MPLSoGRE, MPLSoUDP, etc., no additional behaviors are required.¶
For non-MPLS NVO tunnels, the labels are 24-bit VNIs, not downstream assigned and usually global, i.e. same value for all the PEs attached to the BD. In this case, the rerouting mechanisms described in this document would not work without some additional behaviors: the rerouting mechanism needs to avoid local-bias split-horizon filtering upon reception of the redirected packets. For non-MPLS NVO tunnels, an additional identifier is advertised in Ethernet A-D per EVI routes to enable EVPN Fast Reroute.¶
Non-MPLS NVO tunnel encapsulations may use local-bias procedures instead of
ES label-based split-horizon (for EVPN multihoming).
This means that, e.g. when PE1 sends redirected traffic to multihoming peer PE2 with
the ERL VNI, PE2 will drop the packets due to the filtering based on the tunnel source IP.
To support non-MPLS NVO tunnels such as VXLAN, PE2 in the example above needs to bypass the
source IP based filtering if the VNI identifies a local redirection instance.
The split-horizon filtering would be based on source-IP + FRR-VNI, as opposed to source-IP
only.
Since the VNI is global and not e.g. downstream-assigned, a VNI must be allocated per
ES,EVI for the rerouting mechanisms described in this document to apply.¶
Ethernet A-D per EVI routes are advertised along with the Service SID used for End.DX2 or End.DT2U behaviors Section 6.1.2 of [RFC9252]. These advertisements correspond to the ESL behavior in this document (EVPN Service SID). An additional EVPN Redirect SID is advertised in Ethernet A-D per EVI routes to enable EVPN Fast Reroute, with one of 2 new SRv6 Endpoint Behaviors. At the redirecting PE1, the EVPN Redirect SID is used to implement ERL behaviors described in Section 4.2.¶
The "End.DT2U with Fast Reroute" behavior ("End.DT2U.Reroute" for short) is a variant of the End.DT2U behavior.¶
The End.DT2U.Reroute behavior is defined for the fast-reroute application between two EVPN multi-homing peers, and extends the base End.DT2U behavior. This behavior takes an optional Fast Reroute argument: "Arg.FR2". This argument provides a local mapping to Attachment Circuit (EVI/ESI) for the received traffic, which also implements the forwarding behaviors in Section 5.¶
Any SID instance of this behavior may be used in two ways:¶
Thus, the SID entry for this behavior when instantiated in the FIB performs the disposition of both base L2 Table traffic (i.e., the base End.DT2U behavior) traffic as well as rerouted traffic (i.e., the End.DT2U+Arg.FR2 handling). End.DT2U processing is as in Section 4.11 of [RFC8986].¶
When processing the Upper-Layer header of a packet matching a FIB entry locally instantiated as an End.DT2U.Reroute SID, N does the following:¶
S01. If (Upper-Layer header type == 143(Ethernet) ) { S02. Remove the outer IPv6 header with all its extension headers S03. If (Arg.FR2 is 0) { S04. Process as per Section 4.11 of [RFC8986] (End.DT2U) S05. } Else { S06. Lookup the egress interface L2 OIF I for Arg.FR2 S07. If (L2 OIF interface I is down) { S08. Drop the Ethernet frame S09. } Else { S10. Forward the Ethernet frame to the OIF I bypassing any EVPN DF-Election blocking state S11. } S12. } Else { S13. Process as per Section 4.1.1 of [RFC8986] S14. }¶
To maintain backwards-compatibility, both End.DT2U.Reroute and End.DT2U Behavior SIDs MAY be advertised together whereby legacy receivers ignore the SRv6 SID of unknown behavior End.DT2U.Reroute.¶
The SRv6 L2 Service TLV in this case will carry two SRv6 SID Information sub-TLVs:¶
When advertised alongside an End.DT2U EVPN Service SID, the End.DT2U.Reroute EVPN Reroute SID MUST be identical to the End.DT2U except for the inclusion of an Argument Arg.FR2. Both SRv6 SIDs can use transposition since the function MUST be identical between the 2 SIDs. A receiver unable to validate the applicability of arguments for SRv6 Endpoint Behaviors that are unknown to it MUST ignore the End.DT2U.Reroute SID (Section 3.2.1 of [RFC9252]).¶
Following is an example representation of the BGP Prefix-SID Attribute encoding in this case for a 16-bit argument Arg.FR2 (0xaaaa):¶
When both End.DT2U.Reroute and End.DT2U are advertised, the ingress PE not performing reroute MUST use the End.DT2U as the EVPN Service SID.¶
The "End.DX2 with Fast Reroute" behavior ("End.DX2.Reroute" for short) is a variant of the End.DX2 behavior.¶
The text in this section mirrors that of Section 7.2.1 (End.DT2U.Reroute) and is included for completeness' sake.¶
The End.DX2.Reroute behavior is defined for the fast-reroute application between two EVPN multi-homing peers, and extends the base End.DX2 behavior. This behavior takes an optional Fast Reroute argument: "Arg.FR2". This argument provides a local mapping to Attachment Circuit (EVI/ESI) for the received traffic, which also implements the forwarding behaviors in Section 5.¶
Any SID instance of this behavior may be used in two ways:¶
Thus, the SID entry for this behavior when instantiated in the FIB performs the disposition of both base L2 Table traffic (i.e., the base End.DX2 behavior) traffic as well as rerouted traffic (i.e., the End.DX2+Arg.FR2 handling). End.DX2 processing is as in Section 4.9 of [RFC8986].¶
When processing the Upper-Layer header of a packet matching a FIB entry locally instantiated as an End.DX2.Reroute SID, N does the following:¶
S01. If (Upper-Layer header type == 143(Ethernet) ) { S02. Remove the outer IPv6 header with all its extension headers S03. If (Arg.FR2 is 0) { S04. Process as per Section 4.9 of [RFC8986] (End.DX2) S05. } Else { S06. Lookup the egress interface L2 OIF I for Arg.FR2 S07. If (L2 OIF interface I is down) { S08. Drop the Ethernet frame S09. } Else { S10. Forward the Ethernet frame to the OIF I bypassing any EVPN DF-Election blocking state S11. } S12. } Else { S13. Process as per Section 4.1.1 of [RFC8986] S14. }¶
To maintain backwards-compatibility, both End.DX2.Reroute and End.DX2 Behavior SIDs MAY be advertised together. Receiving PEs SHOULD use the SRv6 SID from the first instance of the Sub-TLV only (Section 3.1 of [RFC9252]), and ignore the SRv6 SID of unknown behavior End.DX2.Reroute (Section 3.2.1 of [RFC9252]).¶
The SRv6 L2 Service TLV in this case will carry two SRv6 SID Information sub-TLVs:¶
When advertised alongside an End.DX2 EVPN Service SID, the End.DX2.Reroute EVPN Reroute SID MUST be identical to the End.DX2 except for the inclusion of an Argument Arg.FR2. Both SRv6 SIDs can use transposition since the function MUST be identical between the 2 SIDs. A receiver unable to validate the applicability of arguments for SRv6 Endpoint Behaviors that are unknown to it MUST ignore the End.DX2.Reroute SID (Section 3.2.1 of [RFC9252]).¶
Following is an example representation of the BGP Prefix-SID Attribute encoding in this case for a 16-bit argument Arg.FR2 (0xaaaa):¶
When both End.DX2.Reroute and End.DX2 are advertised, the ingress PE not performing reroute MUST use the End.DX2 as the EVPN Service SID.¶
End.DT2U.Reroute ad End.DX2.Reroute are variants of their respective base behaviours and
when two SIDs are advertised together in an Ethernet A-D per EVI routre, the variant
advertised MUST be the same as base behaviour.
In other words, advertisement of an End.DT2U.Reroute variant alongside an End.DX2 base is
unusable and SHALL be discarded by receivers, and similarly an End.DX2.Reroute variant
advertised alongside an End.DT2U base SHALL be discarded by receivers.¶
EVPN multi-homing peers in different AS are rather an exception. In Inter-AS Option B or inter‑domain scenarios, the ASBR/ABR and BGP route-reflectors with nexthop-self procedures are extended:¶
While this document describes a new behavior, there are no new BGP extensions required to advertise the redirect label(s) used for EVPN egress link protection. The ESI Label Extended Community defined in Section 7.5 of [I-D.ietf-bess-rfc7432bis] may be advertised along with Ethernet A-D routes:¶
Prior to this document, advertising the ESI Label Extended Community along with an Ethernet A-D per EVI route (Ethertag different than MAX-ET) was undefined, and presumably ignored.¶
Remote PEs SHOULD NOT use the ERLs as a substitution for ESLs in route resolution, and is especially not to be confused with the aliasing and backup path ESL as described and used in Section 8.4 of [I-D.ietf-bess-rfc7432bis].¶
The mechanisms in this document use the EVPN control plane as defined in [I-D.ietf-bess-rfc7432bis] and [RFC8214], and the security considerations described therein are equally applicable. Reroute labels redistributed in EVPN control plane are meant for consumption by the peering PE in a same ES. It is, however, visible in the EVPN control plane to remote peers. Care shall be taken when installing reroute labels, since their use may result in bypassing DF-Election procedures and lead to duplicate traffic at CEs if incorrectly installed.¶
Authors would like to thank Ketan Talaulikar for his review of SRv6 procedures in this document.¶
This document introduces two new Endpoint behaviors. This document requests IANA assign a two new values and update the "SRv6 Endpoint Behaviors" subregistry under the top-level "Segment Routing" registry as follows:¶
Value | Hex | Endpoint Behavior | Reference |
TBD | TBD | End.DT2U.Reroute | This document |
TBD | TBD | End.DX2.Reroute | This document |