<?xml version="1.0" encoding="US-ASCII"?>

<?xml-model href="rfc7991bis.rnc"?>

<!-- <?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?> --> 
<!-- This third-party XSLT can be enabled for direct transformations in XML processors, including most browsers -->


<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">
]>
<!-- If further character entities are required then they should be added to the DOCTYPE above.
     Use of an external entity file is not recommended. -->

<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation -->

<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->

<rfc category="std"
    xmlns:xi="http://www.w3.org/2001/XInclude"
    docName="draft-ietf-bess-evpn-per-mcast-flow-df-election-10"
    consensus="true"
    submissionType="IETF"
    ipr="trust200902"
    tocInclude="true"
    tocDepth="4"
    symRefs="true"
    sortRefs="true">
  <!-- ***** FRONT MATTER ***** -->
  <front>
    <title abbrev="Per multicast flow Designated Forwarder Election for EVPN">Per multicast flow Designated Forwarder Election for EVPN</title>
    
    <author initials="Ali" surname="Sajassi" 
    fullname="Ali Sajassi">
      <organization>Cisco Systems</organization>
    
      <address>
       <postal>
         <street>821 Alder Drive,</street>
         
        <region>MILPITAS, CALIFORNIA 95035</region>
        
        <country>UNITED STATES</country>
       </postal>
       
       <phone></phone>
       <email>sajassi@cisco.com</email>
       </address>
    </author>

    <author initials="Mankamana" surname="Mishra" 
    fullname="Mankamana Mishra">
      <organization>Cisco Systems</organization>
    
      <address>
       <postal>
         <street>821 Alder Drive,</street>
         
        <region>MILPITAS, CALIFORNIA 95035</region>
        
        <country>UNITED STATES</country>
       </postal>
       
       <phone></phone>
       <email>mankamis@cisco.com</email>
      </address>
    </author>

    <author initials="Samir" surname="Thoria" 
        fullname="Samir Thoria">
      <organization>Cisco Systems</organization>
        <address>
            <postal>
                <street>821 Alder Drive,</street>

                <region>MILPITAS, CALIFORNIA 95035</region>

                <country>UNITED STATES</country>
            </postal>

            <phone></phone>
            <email>sthoria@cisco.com</email>
        </address>
    </author>

        <author initials="Jorge" surname="Rabadan"
        fullname="Jorge Rabadan">
      <organization>Nokia</organization>
        <address>
            <postal>
                <street>777 E. Middlefield Road</street>

                <region>Mountain View, CA 94043</region>

                <country>UNITED STATES</country>
            </postal>

            <phone></phone>
            <email>jorge.rabadan@nokia.com</email>
        </address>
    </author>

        <author initials="John" surname="Drake"
        fullname="John Drake">
      <organization>Juniper Networks</organization>
        <address>
            <postal>
                <street></street>

                <region></region>

                <country></country>
            </postal>

            <phone></phone>
            <email>jdrake@juniper.net</email>
        </address>
    </author>


    <date year="2024"/>    
    <area>Routing</area>
    <workgroup>BESS WorkGroup</workgroup>
    <abstract>
        <t>
            <xref target="RFC7432"/> describes mechanism to elect designated forwarder (DF) 
            at the granularity of (ESI, EVI) which is per VLAN (or per group of VLANs in case 
            of VLAN bundle or VLAN-aware bundle service). However, the current level of
            granularity of per-VLAN is not adequate for some applications.<xref target="RFC8584"/> 
            improves base line DF election by introducing HRW DF election. <xref target="RFC9251"/> 
            introduces applicability of EVPN to Multicast flows, routes to sync them and a default DF election. 
            This document is an extension to HRW base draft <xref target="RFC8584"/> and 
            further enhances HRW algorithm for the Multicast flows to do DF election at the granularity  
            of (ESI, VLAN, Mcast flow).
        </t>
    </abstract>
  </front>

  <!-- ***** MIDDLE MATTER ***** -->

  <middle>
      <section title="Introduction">
          <t>
            EVPN based All-Active multi-homing is becoming the basic building 
            block for providing redundancy in  next generation data center 
            deployments as well as 
            service provider access/aggregation networks.
            <xref target="RFC7432"/> defines the role of a designated forwarder 
            as the node in the redundancy group that is responsible to forward 
            Broadcast, Unknown unicast, Multicast (BUM) traffic on that Ethernet 
            Segment (CE device or network) in All-Active multi-homing. 
        </t>
        <t>
              The default DF election mechanism  allows selecting a DF at the
              granularity of (ES, VLAN) or (ES, VLAN bundle) for BUM traffic. 
              While <xref target="RFC8584"/> improve
              on the default DF election procedure, some service provider residential 
              applications require a finer granularity, where whole multicast flows are 
              delivered on a single VLAN.
          </t> 
          <figure  >
            <preamble/>
              <artwork ><![CDATA[            

                            (Multicast sources)
                                     |
                                     |
                                   +---+
                                   |CE4|
                                   +---+
                                     |
                                     |
                               +-----+-----+
                  +------------|   PE-1    |------------+
                  |            |           |            |
                  |            +-----------+            |
                  |                                     |
                  |                   EVPN              |
                  |                                     |
                  |                                     |
                  | (DF)                           (NDF)|
            +-----------+                        +-----------+
            |  |EVI-1|  |                        |  |EVI-1|  |
            |   PE-2    |------------------------|   PE-3    |
            +-----------+                        +-----------+
                   AC1  \                       / AC2                     
                         \                     /                     
                          \      ESI-1        /                     
                           \                 /                     
                            \               /                     
                            +---------------+
                            |    CE2        |
                            +---------------+
                                   |
                                   |
                          (Multiple receivers)


                Figure 1: Multi-homing Network of EVPN 
                          for IPTV deployments
                  ]]></artwork>
              <postamble></postamble>
          </figure>     
          <t> Consider the above topology, which shows a typical residential deployment 
              scenario, where multiple receivers are behind an all-active 
              multihoming segments. All of the multicast traffic is provisioned 
              on EVI-1. Assume PE-2 get elected as DF. According to 
              <xref target="RFC7432"/>, PE-2 will be responsible for forwarding 
              multicast traffic to that Ethernet segment. 

              <list style="symbols">
                  <t>
                      Forcing sole data plane forwarding responsibility on 
                      PE-2 is a limitation in the current DF election mechanism. The topology
                      at Figure 1 would always have only one of the PE to be 
                      elected as DF irrespective of which current DF election 
                      mechanism is in use defined in <xref target="RFC7432"/> 
                      or <xref target="RFC8584"/>.
                  </t>
                  <t> The problem may also manifest itself in a different way. For example, 
                      AC1 happens to use 80% of its available bandwidth to forward unicast data. 
                      And now there is need to serve multicast receivers where it would require 
                      more than 20% of AC1 bandwidth. In this case, AC1 becomes oversubscribed and 
                      multicast traffic drop would be observed even though there is already another 
                      link (AC2) present in network which can be used more efficiently 
                      load balance the multicast traffic. 
                  </t>
              </list>
              In this document, we propose an extension to the HRW base draft to 
              allow DF election at the granularity of (ESI, VLAN, Mcast flow) which would allow 
              multicast flows to be better distributed among redundancy group 
              PEs to share the load.

          </t>
      </section>     

      <section title="Terminology">
          <t> The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
              "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
              document are to be interpreted as described in <xref target="RFC2119"/>  .
          </t>
          <t>With respect to EVPN, this document follows the terminology that has
              been defined in <xref target="RFC7432"/> and <xref target="RFC4601"/> for 
              multicast terminology.
          </t>
      </section>
      <section anchor="df-type" title="The DF Election Extended Community">
          <t> <xref target="RFC8584"/> defines an extended 
              community, which would be used for PEs in redundancy group to reach a consensus as 
              to which DF election procedure is desired.
              A PE can notify other participating PEs in redundancy group about 
              its willingness to support Per multicast flow base DF election 
              capability by signaling a DF election extended community along with 
              Ethernet-Segment Route (Type-4). The 
              current proposal extends the existing extended community defined 
              in <xref target="RFC8584"/>. This draft 
              defines new a DF type.
              <list style="symbols">
                  <t>DF type (1 octet) - Encodes the DF Election algorithm 
                      values (between 0 and 255) that the advertising PE 
                      desires to use for the ES. This document requests two new types in the DF type field:  
                      <list style="symbols">
                       
                          <t> Type 4: HRW base per (S,G) multicast flow DF election (explained in this document)
                         </t>
                          <t> Type 5: HRW base per (*,G) multicast flow DF election (explained in this document)
                         </t>

                      </list>
                  </t>
                  <t>  The <xref target="RFC8584"/> 
                      describes encoding of capabilities associated to the DF 
                      election algorithm using Bitmap field. When these 
                      capabilities bits are set along with the DF type-4 and type-5, 
                      they need to be interpreted in 
                      context of this new DF type-4 and type-5. For example, consider a 
                      scenario where all PEs in the same redundancy group 
                      (same ES) can support both AC-DF, DF type-4 and DF type-5 and 
                      receive such indications from the other PEs in the 
                      ES. In this scenario, if a VLAN is not active in a PE, 
                      then the DF election procedure on all PEs in the ES 
                      should factor that in and exclude that PE in the DF 
                      election per multicast flow.   
                  </t>
                  <t> A PE SHOULD attach the DF election Extended Community to ES route
                      and Extended Community MUST be sent if the ES is locally configured 
                      for DF type Per Multicast flow DF election. Only one DF Election 
                      Extended community can be sent along with an ES route.
                  </t>
                  <t> When a PE receives the ES Routes from all the other PEs for 
                      the ES, it checks if all of other PEs have advertised their 
                      desire to proceed by Per multicast flow DF election. 
                      If all peering PEs have done so, it performs DF 
                      election based on Per multicast flow procedure. But if: 
                      <list style="symbols">
                          <t>There is at least one PE which advertised route-4 ( AD per ES Route) which 
                      does not indicate its capability to perform Per multicast flow 
                      DF election. OR 
                         </t>
                         <t> There is at least one PE signaling single active in the AD per ES route
                         </t>
                 </list>
                  it MUST be considered as an indication to support  of only Default DF election 
                  <xref target="RFC7432"/> and DF election procedure in <xref target="RFC7432"/> MUST be used. 
                  </t>
              </list>
              
          </t>
      </section>

          <section title="HRW base per multicast flow EVPN DF election">
              <t> This document is an extension of 
                  <xref target="RFC8584"/>, so this draft 
                  does not repeat the description of HRW algorithm itself.
              </t>
              <t> EVPN PE does the discovery of redundancy groups based on 
                  <xref target="RFC7432"/>. If redundancy group consists of 
                  N peering EVPN PE nodes, after the discovery all PEs build an 
                  unordered list of IP address of all the nodes in the redundancy 
                  group. The procedure defined in this draft does not require 
                  the list of PEs to be ordered. Address [i] denotes the IP address 
                  of the [i]th EVPN PE in redundancy group where (0 &lt; i &lt;= N ). 
              </t>
              <section anchor="v3-df" title="DF election for IGMP (S,G) membership request ">
                  <t> The DF is the PE who has maximum weight for (S, G, V, Es) where 
                      <list style="symbols">
                          <t>S - Multicast Source </t> 
                          <t>G - Multicast Group </t>
                          <t>V - VLAN ID.</t>
                          <t>Es - Ethernet Segment Identifier</t>
                      </list>
                  </t>
                  <t>
                      Address[i] is address of the ith PE. The PEs IP address length does not matter as only the lower-order 31 bits are modulo significant.
                      <list style="numbers">
                          <t> Weight
                                  <list style="symbols">
                                      <t> The weight of PE(i) to (S,G,VLAN ID, Es) is calculated by function,
                                          weight (S,G,V, Es, Address(i)), where (0 &lt; i &lt;= N), PE(i)
                                          is the PE at ordinal i.</t>
                                      <t> Weight (S,G,V, Es,  Address(i)) = (1103515245. ((1103515245.Address(i) + 12345) XOR D(S,G,V,ESI))+12345) (mod 2^31)
                                      </t>
                                      <t> In case of tie, the PE whose IP address is numerically least is chosen.
                                      </t>
                                  </list>
                          </t>
                          <t> Digest
                              <list style="symbols">
                                  <t>D(S,G,V, Es) = CRC_32(S,G,V, Es)</t>
                                  <t> Here D(S,G,V,Es) is the 31-bit digest (CRC_32 and discarding the MSB) of the Source
                                      IP, Group IP, Vlan ID and Es. The CRC MUST proceed as if the architecture is in network
                                      byte order (big-endian).</t>
                              </list>
                          </t>
                      </list>
                  </t>
              </section>
              <section anchor="v2-df" title="DF election for IGMP (*,G) membership request ">
                  <t> The DF is the PE who has maximum weight for (G, V, Es) where 
                      <list style="symbols">
                          <t>G - Multicast Group </t>
                          <t>V - VLAN ID.</t>
                          <t>Es - Ethernet Segment Identifier</t>
                      </list>
                  </t>
                  <t>
                      Address[i] is address of the ith PE. The PEs IP address length does not matter as only the lower-order 31 bits are modulo significant.
                      <list style="numbers">
                          <t> Weight
                                  <list style="symbols">
                                      <t> The weight of PE(i) to (G,VLAN ID, Es) is calculated by function,
                                          weight (G,V, Es, Address(i)), where (0 &lt; i &lt;= N), PE(i)
                                          is the PE at ordinal i.</t>
                                      <t> Weight (G,V, Es,  Address(i)) = (1103515245. ((1103515245.Address(i) + 12345) XOR D(G,V,ESI))+12345) (mod 2^31)
                                      </t>
                                      <t> In case of tie, the PE whose IP address is numerically least is chosen.
                                      </t>
                                  </list>
                          </t>
                          <t> Digest
                              <list style="symbols">
                                  <t>D(G,V, Es) = CRC_32(G,V, Es)</t>
                                  <t> Here D(G,V,Es) is the 31-bit digest (CRC_32 and discarding the MSB) of the 
                                      Group IP, Vlan ID and Es. The CRC MUST proceed as if the architecture is in network
                                      byte order (big-endian).</t>
                              </list>
                          </t>
                      </list>
                  </t>
              </section>

              <section anchor="default" title="Default DF election procedure">
                  <t> Per multicast DF election procedure would be applicable only 
                      when host behind Attachment Circuit (of the Es) start sending IGMP membership 
                      requests. Membership requests are synced using procedure defined in <xref target="RFC9251"/>, and each of 
                      the PE in redundancy group can use per flow DF election and create DF state per multicast flow. 
                      The HRW DF election "Type 1" procedure defined in <xref target="RFC8584"/> MUST be used for the Es DF election and SHOULD be performed on Es even 
                      before learning multicast membership request state. This default election procedure MUST be used at port level but will be overwritten by Per flow DF 
                      election as and when new membership request state are learnt.
                  </t>
              </section>
          </section>

          <section title="Procedure to use per multicast flow DF election algorithm  ">
              <figure  align="center">
                  <artwork align="center"><![CDATA[

                                     Multicast  Source
                                             |
                                             |
                                             |
                                             |
                                         +---------+
                          +--------------+  PE-4   +--------------+
                          |              |         |              |
                          |              +---------+              |
                          |                                       |
                          |              EVPN CORE                |
                          |                                       |
                          |                                       |
                          |                                       |
                      +---------+        +---------+         +---------+
                      |  PE-1   +--------+   PE-2  +---------+   PE-3  |
                      |  EVI-1  |        |  EVI-1  |         | EVI-1   |
                      +---------+        +---------+         +---------+
                           |__________________|___________________|     
                         AC-1    ESI-1        | AC-2               AC-3
                                         +---------+
                                         |  CE-1   |
                                         |         |
                                         +---------+
                                              |
                                              |
                                              |
                                              |
                                      Multicast Receivers

                      Figure-2 : Multihomed network   
                      ]]></artwork>
                  <postamble></postamble>
              </figure>

              <t> Figure-2 shows multihomed network. Where EVPN PE-1, PE-2, PE-3
                  are multihomed to CE-1. Multiple multicast receivers are behind 
                  all active multihoming segment. 
                  <list style="numbers">
                      <t> PEs connected to the same Ethernet segment can 
                          automatically discover each other through exchange 
                          of the Ethernet Segment Route. This draft does not change 
                          any of this procedure, it still uses the procedure defined in 
                          <xref target="RFC7432"/>.
                      </t>
                      <t> Each of the PEs in redundancy group advertise Ethernet 
                          segment route with extended community indicating their 
                          ability to participate in per multicast flow 
                          DF election procedure. Since Per multicast flow 
                          would not be applicable unless PE learns about 
                          membership request from receiver, there is a need to 
                          have the default DF election among PEs in redundancy 
                          group for BUM traffic. Until multicast membership state are learnt, we use the the DF election procedure in
                          <xref target="default"/>, namely HRW per (v,Es) as defined in <xref target="RFC8584"/> .
 
                      </t>
                      <t> When a receiver starts sending membership requests for (s1,g1), 
                          where s1 is multicast source address and g1 is multicast 
                          group address, CE-1 could hash membership request (IGMP join) 
                          to any of the PEs in 
                          redundancy group. Let's consider it is hashed to PE-2. 
                          <xref target="RFC9251"/> defines a 
                          procedure to sync IGMP join state 
                          among redundancy group of PEs. Now each of the PE would 
                          have information about membership request (s1,g1) and each 
                          of them run DF election procedure <xref target="v3-df"/> to
                          elect DF among participating PEs in redundancy group. 
                          Consider PE-2 gets elected as DF for multicast flow (s1,g1).
                          <list style="numbers">
                              <t> PE-1 forwarding state would be nDF for flow (s1,g1) and 
                                  DF for rest other BUM traffic.</t>
                              <t> PE-2 forwarding state would be DF for flow (s1,g1) and 
                                  nDF for rest other BUM traffic.</t>
                              <t> PE-3 forwarding state would be nDF for flow (s1,g1) and 
                                  rest other BUM traffic.</t>
                          </list>
                      </t>
                      <t> As and when new multicast membership request comes, 
                          same procedure as above would continue.
                      </t>
                      <t> If <xref target="df-type"/> has DF type 4, For membership request (S,G) it MUST use <xref target="v3-df"/> to 
                          elect DF among participating PEs. And membership request (*,G) MUST use <xref target="v2-df"/> to elect DF among participating PEs.
                      </t>
                  </list>
              </t>
          </section>
          <section title="Triggers for DF re-election">
              <t> There are multiple triggers which can cause DF re-election. 
                  Some of the triggers could be 
                  <list style="numbers">
                      <t> Local ES going down due to physical failure or 
                          configuration change triggers DF re-election at peering PE.
                      </t>
                      <t> Detection of new PE through ES route.   
                      </t>
                      <t> AC going up / down
                      </t>
                      <t> ESI change 
                      </t>
                      <t> Remote PE removed / Down 
                      </t>
                      <t> Local configuration change of DF election Type and peering PE consensus on new DF Type
                      </t>
                  </list>
                  This document does not provide any new mechanism to handle DF 
                  re-election procedure. It uses the existing mechanism defined 
                  in <xref target="RFC7432"/>. Whenever either of the triggers occur, a
                  DF re-election would be done. and all of the flows would be 
                  redistributed among existing PEs in redundancy group for ES.
              </t>
          </section>

     <section title="Security Considerations">
         <t>The same Security Considerations described in <xref target="RFC7432"/>
             are valid for this document.
         </t>
     </section>

     <section title="IANA Considerations">
         <t> Allocation of DF type in DF extended community for EVPN.  
         </t>
 </section>

      <section title="Acknowledgement">
          <t>Authors would like to acknowledge helpful comments
   and contributions of Luc Andre Burdet.
          </t>
      </section>

    
    
  </middle>

  <!--  *****BACK MATTER ***** -->

  <back>

      <references title='Normative References'>
          <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.2119.xml"/>
          <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.7432.xml"/>
          <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.4601.xml"/>
          <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.9251.xml"/>
          <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.8584.xml"/>
      </references>

  </back>
</rfc>
