Add ZEBRA NHG recursive handling #16028

pguibert6WIND · 2024-05-17T10:18:01Z

This series of commit introduces the ability for ZEBRA to handle recursive routes.
When nexthop tracking notified the protocol daemon of a change, the protocol daemon re-installs the route, and expects that the ZEBRA will resolve the recursive route according to the new resolution.

Today, there is no recursivity handled for NHG at zebra level. This is a must if protocol daemon want to install routes by relying on recursivity.

pguibert6WIND · 2024-05-17T14:14:22Z

ci:rerun

ton31337

Sorry if I misunderstood something, but how SRTE color is related to this NHG recursive handling work? Isn't it like a separate feature/fix?

github-actions · 2024-05-28T14:11:41Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

There is a CPU issue in ZEBRA when BGP installs and removes a lot of routes at the same time. The vtysh and shell become unreachable. This is the case of BGP failover scenarios with two peers, and one of the peers becoming unreachable. For each route change, it appears that nexthop tracking is called to check impact about a new route (un)availability. Two observations are done: - In the case of a specific route change, if a bigger route (or a default route is present like it is in the setup) exists, then nexthop tracking is called. there is no need to call nexthop tracking for the same default prefix, knowing that the dplane_result thread handled bulks of routes at the same time. - The first picture from the below link indicates nexthop tracking consumes time, and maintaining this activity in the zebra main thread will still result in STARVATION messages. Propose to separate the nht notifications from the dplane_result thread by creating a queue list that will store the prefixes to evaluate against nexthop tracking. Before enqueuing it, a check is done if the same prefix has not been called before. The processing is done in a separate 'rib_process_nht_thread_loop' function call. Link: FRRouting#16028 Signed-off-by: Philippe Guibert <[email protected]>

There is a CPU issue in ZEBRA when BGP installs and removes a lot of routes at the same time. The vtysh and shell become unreachable. This is the case of BGP failover scenarios with two peers, and one of the peers becoming unreachable. For each route change, some route notifications may be called. This taks is cpu-intensive, and will perform I/O operations when calling the ZAPI socket to notify daemon of the route (un)install success or failure. The second picture of the below link illustrates it. In order to reduce the time taken in the dplane_result thread, propose to separate the nht notifications from the dplane_result thread by creating a queue list that will store the routes and the client that requested to (un)install the route. The processing is done in a separate 'zebra_route_process_notify_thread_loop()' function call. Link: FRRouting#16028 Signed-off-by: Philippe Guibert <[email protected]>

There is a CPU issue in ZEBRA when BGP installs and removes a lot of routes at the same time. The vtysh and shell become unreachable. This is the case of BGP failover scenarios with two peers, and one of the peers becoming unreachable. For each route change, it appears that nexthop tracking is called to check impact about a new route (un)availability. Two observations are done: - In the case of a specific route change, if a bigger route (or a default route is present like it is in the setup) exists, then nexthop tracking is called. there is no need to call nexthop tracking for the same default prefix, knowing that the dplane_result thread handled bulks of routes at the same time. - The first picture from the below link indicates nexthop tracking consumes time, and maintaining this activity in the zebra main thread will still result in STARVATION messages. Propose to separate the nht notifications from the dplane_result thread by creating a queue list that will store the prefixes to evaluate against nexthop tracking. Before enqueuing it, a check is done if the same prefix has not been called before. The processing is done in a separate 'rib_process_nht_thread_loop' function call. Link: FRRouting#16028 Signed-off-by: Philippe Guibert <[email protected]>

There is a CPU issue in ZEBRA when BGP installs and removes a lot of routes at the same time. The vtysh and shell become unreachable. This is the case of BGP failover scenarios with two peers, and one of the peers becoming unreachable. For each route change, some route notifications may be called. This taks is cpu-intensive, and will perform I/O operations when calling the ZAPI socket to notify daemon of the route (un)install success or failure. The second picture of the below link illustrates it. In order to reduce the time taken in the dplane_result thread, propose to separate the nht notifications from the dplane_result thread by creating a queue list that will store the routes and the client that requested to (un)install the route. The processing is done in a separate 'zebra_route_process_notify_thread_loop()' function call. Link: FRRouting#16028 Signed-off-by: Philippe Guibert <[email protected]>

There is a CPU issue in ZEBRA when BGP installs and removes a lot of routes at the same time. The vtysh and shell become unreachable. This is the case of BGP failover scenarios with two peers, and one of the peers becoming unreachable. For each route change, it appears that nexthop tracking is called to check impact about a new route (un)availability. Two observations are done: - In the case of a specific route change, if a bigger route (or a default route is present like it is in the setup) exists, then nexthop tracking is called. there is no need to call nexthop tracking for the same default prefix, knowing that the dplane_result thread handled bulks of routes at the same time. - The first picture from the below link indicates nexthop tracking consumes time, and maintaining this activity in the zebra main thread will still result in STARVATION messages. Propose to separate the nht notifications from the dplane_result thread by creating a queue list that will store the prefixes to evaluate against nexthop tracking. Before enqueuing it, a check is done if the same prefix has not been called before. The processing is done in a separate 'rib_process_nht_thread_loop' function call. Link: FRRouting#16028 Signed-off-by: Philippe Guibert <[email protected]>

pguibert6WIND · 2024-05-31T07:36:29Z

This pull request effectively enhances nexthop group extension to recursive support.

Sorry if I misunderstood something, but how SRTE color is related to this NHG recursive handling work? Isn't it like a separate feature/fix?

I separated the work that is not related to recursive and SRTE.

pguibert6WIND · 2024-06-03T08:07:16Z

ci:rerun

This small rework prepares next commit. Signed-off-by: Philippe Guibert <[email protected]>

Nexthop-groups do not support route recursion. It means that if its nexthop is not directly connected, there is no resolution of the nexhop against other types of route. The nexthop-group is always marked inactive regardless of whether the nexthop can be resolved. ROUTE_ADD ZAPI messages from daemons can convey a ZEBRA_FLAG_ALLOW_RECURSION flag to tell zebra whether it should resolve nexthop that are not directly connected. The flag is unset by default. In a similar way, add a NEXTHOP_GROUP_ALLOW_RECURSION for the NHG_ADD ZAPI message, which is unset by default. Signed-off-by: Philippe Guibert <[email protected]>

All protocol daemons that configure a non directly connected nexthop-group will create an active nexthop-group. This is wrong, as the nexthop may not be reachable, and the nexthop-group should be considered inactive. Actually, protocol nexthop-groups are always considered as active at ZEBRA level, whereas they should be submitted to the nexthop reachability mechanism which is done when a route is configured. For instance, when BGP sends a ROUTE_ADD message with the nexthop of the BGP update to ZEBRA. ZEBRA will resolve the prefix reachability by looking at the nexthop reachability: The nexthop_active(prefix, nexthop) function is called. Fix this by calling the nexthop_active() function, when the nexthop is not a nexthop interface. Note that without the prefix to apply nexthop to, it will not be possible to do some specific checks like recursivity against ourselves. For example, the below 192.168.3.0/24 BGP prefix is resolved over himself. > # show bgp ipv4 > [..] > *>i 192.168.3.0/24 192.168.3.3 0 100 0 i > # show ip route 192.168.3.3 > Routing entry for 192.168.3.0/24 > Known via "bgp", distance 200, metric 0 > Last update 00:00:34 ago > 192.168.3.3 inactive, weight 1 > Routing entry for 192.168.3.0/24 > Known via "static", distance 1, metric 0, best > Last update 00:00:36 ago > * 192.168.5.10, via r1-eth2, weight 1 > # output > 2024/06/12 16:15:47.656466 ZEBRA: [ZJVZ4-XEGPF] default(0:254):192.168.3.0/24: Examine re 0x55d446afd0c0 (bgp) status: Changed flags: Recursion iBGP dist 200 metric 0 > 2024/06/12 16:15:47.656470 ZEBRA: [YPVDZ-KQPM9] nexthop_active: Matched against ourself and prefix length is not max bit length For those corner cases, using protocol nexthop groups will not be possible: each protocol will have to check those cases. Signed-off-by: Philippe Guibert <[email protected]>

Add the ability for the sharpd daemon to program nexthop-groups that are unresolved: this includes blackhole routes, but also nexthops that have no interface information. Signed-off-by: Philippe Guibert <[email protected]>

In sharpd, configuring a nexthop-group with an IP nexthop that is not directly connected does create an inactive NHG context in zebra: > ubuntu2204(config)# interface loop1 > ubuntu2204(config-if)# ip address 192.0.2.1/24 > ubuntu2204(config-if)# exi > ubuntu2204(config)# ip route 10.200.0.0/24 192.0.2.100 > ubuntu2204(config)# nexthop-group ABCD > ubuntu2204(config-nh-group)# nexthop 10.200.0.62 > 2024/01/17 16:52:44 SHARP: [JWRCN-N9K90] Installed nhg 181818168 > ubuntu2204(config-nh-group)# do show nexthop-group rib 181818168 > ID: 181818168 (sharp) > RefCnt: 1 > Uptime: 00:00:04 > VRF: default > Depends: (841) > via 10.200.0.62 (vrf default) inactive, weight 1 Add the 'allow-recursion' vty command under nexthop-group configuration. When set, the nexthop-group ABCD is added or updated, and will update the nexthop resolution as expected. > ubuntu2204(config)# interface loop1 > ubuntu2204(config-if)# ip address 192.0.2.1/24 > ubuntu2204(config-if)# exi > ubuntu2204(config)# ip route 10.200.0.0/24 192.0.2.100 > ubuntu2204(config)# nexthop-group ABCD > ubuntu2204(config-nh-group)# allow-recursion > ubuntu2204(config-nh-group)# nexthop 10.200.0.62 > 2024/01/17 16:57:44 SHARP: [JWRCN-N9K90] Installed nhg 181818168 > ubuntu2204(config-nh-group)# do show nexthop-group rib 181818168 > ID: 181818168 (sharp) > RefCnt: 1 > Uptime: 00:00:04 > VRF: default > Valid, Installed > Depends: (842) > via 10.200.0.62 (vrf default) (recursive), weight 1 > via 192.0.2.100, loop1 (vrf default), weight 1 The allow-recursion flag is disabled by default, as it is today with other control plane daemons. Signed-off-by: Philippe Guibert <[email protected]>

Configuring a nexthop-group without allow-recursion and referencing it in a sharp route results in a recursive route. > nexthop-group ABCD > nexthop 1.1.1.1 > exit > exit > sharp install routes 3.3.3.3 nexthop-group ABCD 1 The route is attached to a ZEBRA nexthop-group: > # show ip route 3.3.3.3 nexthop-group > Routing entry for 3.3.3.3/32 > Known via "sharp", distance 150, metric 0, best > Last update 00:00:03 ago > Nexthop Group ID: 17 <--- a zebra NHG is used instead > 1.1.1.1 (recursive), weight 1 > * 10.0.2.2, via mgmt0, weight 1 The used nexthop-group is recursive. > # show nexthop-group rib 17 > ID: 17 (zebra) > RefCnt: 1 > Uptime: 00:00:54 > VRF: default > Valid > Depends: (18) > via 1.1.1.1 (vrf default) (recursive), weight 1 > via 10.0.2.2, mgmt0 (vrf default), weight 1 Actually, sharpd sends a ROUTE_ADD message with the nexthop IP from the ABCD nexthop-group. Then, zebra creates its own nexthop-group. Since zebra nexthop-groups are recursive by default, recursion is processed. Add the 'use-protocol-nexthop-group' keyword to force the sharp route to use the protocol nexthop-group. > nexthop-group EFGH > nexthop 4.4.4.4 > end > sharp install routes 5.5.5.5 nexthop-group EFGH 1 use-protocol-nexthop-group > # show ip route 5.5.5.5 nexthop-group > Routing entry for 5.5.5.5/32 > Known via "sharp", distance 150, metric 0, best > Last update 00:02:01 ago > Nexthop Group ID: 181818168 <-- protocol NHG is used > 4.4.4.4, weight 1 > > # show nexthop-group rib 181818168 > ID: 181818168 (sharp) > RefCnt: 1 > Uptime: 00:00:15 > VRF: default > Depends: (15) > via 1.1.1.1 (vrf default) inactive, weight 1 > [..] Signed-off-by: Philippe Guibert <[email protected]>

Add a nexthop group test that ensures that a recursive next-hop is resolved in zebra. Signed-off-by: Philippe Guibert <[email protected]>

The nexthop active() code has a control over a ZEBRA_FLAG_IBGP flag which is passed as parameter by the caller. But nexthop-groups do never use that flag. This flag is used by iBGP when a route is installed to ZEBRA: when used, and when recursion is not autorised on that route, only the first of the below traces is displayed. The second trace appears when an eBGP route is installed. > cli# debug zebra rib detail > [..] > 2023/12/17 21:09:43 ZEBRA: [NWWQT-S584X] nexthop_active: Route Type bgp has not turned on recursion 172.31.0.102/32 failed to match > 2023/12/17 21:09:43 ZEBRA: [PKF9S-7G8XM] EBGP: see "disable-ebgp-connected-route-check" or "disable-connected-check" Routes using nexthop-groups lack the iBGP flag which is present in routes using nexthops. This commit adds the IBGP flag option in the nexthop-group structure. The value is copied and translated to the nhg_has_entry structure in ZEBRA. Signed-off-by: Philippe Guibert <[email protected]>

Creating a nexthop-group with a color attribute is not possible. Using such nexthops would help steer traffic to a path that would be forged by the SR-TE daemon, instead of using the regular path calculated by the SPF algorithm of IGPs. Passing the color information with the zapi_route structure is possible by using the message attribute. But this attribute is missing in the zapi_nhg structure. Add a 32 bit message attribute in the zapi_nhg structure. This field is conveyed in the ZAPI message, when NHG_ADD is called. Conveyed message flags value is 0 for the moment. It will be set in the next commits. Signed-off-by: Philippe Guibert <[email protected]>

…of nexthop-groups In sharpd, configuring a nexthop-group with a colored nexthop is not possible. Add a new option under nexthop command to configure an srte color: > nexthop-group 1 > [..] > nexthop 192.0.2.100 color 100 Map this option under a message attribute located in the nexthop_group structure. This attribute will be encoded and decoded, by using the message attribute of the zapi_nhg structure. Signed-off-by: Philippe Guibert <[email protected]>

Testing the color of a nexthop requires to watch for nexthop tracking events from zebra: it tells whether a path for a colored nexthop exists or not, and it is up to the CP daemon that owns the nexthop-group to refresh it. Add a 'color' option to the 'sharp watch nexthop' command. This call will trigger the nexthop-group to be re-added in zebra. The next commit will introduce the topotest test about color. Signed-off-by: Philippe Guibert <[email protected]>

Add a test for the color attribute of nexthop-groups. The marked color will modify the resolved path of the nexthop-group. The test checks the impact of the set of the color attribute. The test also checks the unconfiguration of the SRTE policy. Signed-off-by: Philippe Guibert <[email protected]>

The nexthop-group library needs to extend the nexthop-group object to include child groups. The current hook handlers use the 'nexthop' wording, whereas the 'nexthop_or_child_group' wording should be better. Rename the following hook handlers: add_nexthop, del_nexthop To the following: add_nexthop_or_child_group, del_nexthop_or_child_group. Signed-off-by: Philippe Guibert <[email protected]>

Protocol daemons can create nexthop groups with multiple nexthops, but can not create a hierarchy between nexthop groups. Having a nexthop group hierarchy is wishable at protocol level, because of the multiple advantages: - It can facilitate the updates at ZEBRA level in ECMP cases, like for BGP addpath functionality: the same prefix may be announced multiple times with a different nexthop. If one of those nexthops become unreachable, if nexthops are considered, all the nexthop groups will be parsed, and only the ones using the failed nexthop will be refreshed. - This hierarchy representation will be maintained between the protocol daemon and ZEBRA. The next series of commits introduce the support of the nexthop-group hierarchy. This commit adds a new group command under nexthop-group sub node. That group name points to another nexthop group. > nexthop-group ABCD > child-group ECMP1 > child-group ECMP2 > exit > nexthop-group ECMP1 > nexthop 192.168.0.100 loop1 > exit > nexthop-group ECMP2 > nexthop 192.168.0.105 loop1 > exit The child-group information is stored, but is not yet used by any CP daemon. Daemons like PBR will ignore notifications related to child-group information. Signed-off-by: Philippe Guibert <[email protected]>

There is no messaging API to convey nexthop group hierarchy information from protocol daemon to the zebra daemon. Add a new ZAPI message (NHG_CHILD_ADD) for sending nexthop-group information of nexthop-groups that have the 'child-group' command configured. The message includes the parent NHG ID, as well as the child NHG IDs. A function API is available, but not used yet by CP daemons to send the nexthop group information. The message is read by ZEBRA: the nexthop IP information of each child group is appended to the parent NHG ID. - On the one hand, the nexthop group hierarchy of the protocol daemon is lost in ZEBRA, in favor of a nexthop-group with multiple nexthop IPs. - On the other hand, the nexthop group hierarchy information is maintained as the parent nexthop-group does what the protocol wanted. Signed-off-by: Philippe Guibert <[email protected]>

The hierarchical nexthop groups need to be tested at protocol level: the SHARP daemon must be adapted to configure nexthop groups in ZEBRA. Each modified nexthop group must be notified to the parent nexthop groups that use that nexthop group. The installation of parent nexthop groups must follow its child groups. The removal of child groups must follow the removal of its parent nexthop-group. - A callback registration mechanism is added in the nexthop-group library to notify parent nexthop groups which own a given child group. - The configuration order is managed by SHARP - SHARP uses the NHG_GROUP_ADD ZAPI message. There are no checks on the number of ECMP nexthops in a child group, knowing that from SHARP perspective, one child group equals one nexthop. Signed-off-by: Philippe Guibert <[email protected]>

Add a test for hierarchical nexthop groups. A nexthop group with child nexthop groups is added and modified. Signed-off-by: Philippe Guibert <[email protected]>

A nexthop-group does not display the 'fib' flag value of its nexthops. > nexthop-group A > nexthop 192.168.1.55 loop1 > exit observed: > ubuntu2204# show nexthop-group rib 181818168 json > { > "181818168":{ > "type":"sharp", > "refCount":1, > "uptime":"00:00:17", > "vrf":"default", > "valid":true, > "installed":true, > "depends":[ > 528 > ], > "nexthops":[ > { > "flags":1, > "ip":"192.168.1.55", > "afi":"ipv4", > "interfaceIndex":3, > "interfaceName":"loop1", > "vrf":"default", > "active":true, > "weight":1 > } > ] > } > } The FIB flag is used to inform the user that a given nexthop is installed in the system, which is the case when using iproute2: > # ip nexthop show id 181818168 > id 181818168 group 15 proto 194 > # ip nexthop show id 15 > id 15 via 192.168.1.55 dev loop1 scope link proto 194 Fix this by refreshing the FIB flag value of its nexthops, when the dataplane result indicate the nexthop-group is installed. > ubuntu2204# show nexthop-group rib 181818168 json > { > "181818168":{ > "type":"sharp", > "refCount":1, > "uptime":"00:00:25", > "vrf":"default", > "valid":true, > "installed":true, > "depends":[ > 574 > ], > "nexthops":[ > { > "flags":3, > "fib":true, > "ip":"192.168.1.55", > "afi":"ipv4", > "interfaceIndex":3, > "interfaceName":"loop1", > "vrf":"default", > "active":true, > "weight":1 > } > ] > } > } > Link: FRRouting#16332 Signed-off-by: Philippe Guibert <[email protected]>

The flags of the nexthops of a nexthop-group whose child nexthop resolves over the same nexthop are both considered as installed in the fib, whereas one of them is not. > conf t > log file /tmp/ccc.txt > ip route 4.4.4.0/24 10.0.2.150 > ip route 6.6.6.0/24 10.0.2.150 > nexthop-group A > allow-recursion > nexthop 4.4.4.22 > exit > nexthop-group B > allow-recursion > nexthop 6.6.6.22 > exit > ! nhid 181818170 creation > nexthop-group AB > allow-recursion > child-group A > child-group B > exit > exit > ! static route creation to compare with > configure terminal > ip route 8.8.8.8/32 4.4.4.11 > ip route 8.8.8.8/32 6.6.6.11 Observed: > ubuntu2204hwe(config)# do show nexthop-group rib 181818170 json > { > "181818170":{ > "type":"sharp", > "refCount":1, > "uptime":"00:00:26", > "vrf":"default", > "valid":true, > "installed":true, > "depends":[ > 45, > 47 > ], > "nexthops":[ > { > "flags":5, > "ip":"4.4.4.22", > [..] > }, > { > "flags":3, > "fib":true, > "ip":"10.0.2.150", > [..] > }, > { > "flags":5, > "ip":"6.6.6.22", > [..] > }, > { > "flags":19, > "duplicate":true, > "fib":true, <- should not be present > "ip":"10.0.2.150", > [..] > } > ] > } > } Expected to be same as when when using static route: the last recursive nexthop is unselected. > ubuntu2204# show ip route 8.8.8.8/32 > Routing entry for 8.8.8.8/32 > Known via "static", distance 1, metric 0, best > Last update 00:00:14 ago > 4.4.4.11 (recursive), weight 1 > * 10.0.2.150, via mgmt0, weight 1 > 6.6.6.11 (recursive), weight 1 > 10.0.2.150, via mgmt0 (duplicate nexthop removed), weight 1 Fix this by not setting the FIB flag when the DUPLICATE flag is present. > ubuntu2204# do show nexthop-group rib 181818170 json > { > "181818170":{ > [..] > "nexthops":[ > { > [..] > }, > { > [..] > }, > { > [..] > }, > { > "flags":17, > "duplicate":true, > "ip":"10.0.2.150", > [..] > } > ] > } > } NHG dataplane result must be parsed against the duplicate, because the duplicated nexthops have been removed before sending it to the dataplane (see below link). Fix this by updating the duplicate flag on the duplicated nexthops. Also, avoid setting the FIB flag when duplicate is found. Link: 8dbc800 ("zebra: Prevent duplication and overflow in nhe2grp") Signed-off-by: Philippe Guibert <[email protected]>

When updating a route with a protocol nexthop-group, the route and nexthop-group contexts have the fib flag value lost. configuration used: > configure terminal > ip route 6.6.6.0/24 192.168.1.44 > ip route 6.6.6.0/24 192.168.1.47 > nexthop-group A > allow-recursion > nexthop 6.6.6.22 > ! fib flag is correctly set on nhid 181818168 > end > sharp install routes 9.9.9.9 nexthop-group A 1 > ! fib flag is correctly set on route 9.9.9.9 nhid 181818168 > configure terminal > ip route 6.6.6.0/24 192.168.1.49 > end > ! re-install route to 9.9.9.9 to trigger update > sharp install routes 9.9.9.9 nexthop-group A 1 observed: > ubuntu2204# do show ip route 9.9.9.9 json > { > "9.9.9.9/32":[ > { > "prefix":"9.9.9.9/32", > [..] > "nexthops":[ > { > [..] > "recursive":true, > }, > { > "flags":1, > "ip":"192.168.1.44", > "afi":"ipv4", > "interfaceIndex":3, > "interfaceName":"loop1", > "resolver":true, > "active":true, > "weight":1 > }, > { > "flags":1, > "ip":"192.168.1.47", > "afi":"ipv4", > "interfaceIndex":3, > "interfaceName":"loop1", > "resolver":true, > "active":true, > "weight":1 > }, > { > "flags":1, > "ip":"192.168.1.49", > "afi":"ipv4", > "interfaceIndex":3, > "interfaceName":"loop1", > "resolver":true, > "active":true, > "weight":1 > } > ] > } > ] > } > > ubuntu2204# show nexthop-group rib 181818168 json > { > "181818168":{ > "type":"sharp", > "refCount":2, > "uptime":"00:06:35", > "vrf":"default", > "valid":true, > "installed":true, > "depends":[ > 324 > ], > "nexthops":[ > { > "flags":5, > "ip":"6.6.6.22", > "afi":"ipv4", > "vrf":"default", > "active":true, > "recursive":true, > "weight":1 > }, > { > "flags":1, > "ip":"192.168.1.44", > "afi":"ipv4", > "interfaceIndex":3, > "interfaceName":"loop1", > "resolver":true, > "vrf":"default", > "active":true, > "weight":1 > }, > { > "flags":1, > "ip":"192.168.1.47", > "afi":"ipv4", > "interfaceIndex":3, > "interfaceName":"loop1", > "resolver":true, > "vrf":"default", > "active":true, > "weight":1 > }, > { > "flags":1, > "ip":"192.168.1.49", > "afi":"ipv4", > "interfaceIndex":3, > "interfaceName":"loop1", > "resolver":true, > "vrf":"default", > "active":true, > "weight":1 > } > ] > } > } This case happens when the route to update is the same and the nhg is a protocol nexthop-group. Fix this by updating the nexthop flags of the original nhg instead of the nhg from the passed dataplane nhg context. Fixes: 27805e7 ("zebra: Properly set NEXTHOP_FLAG_FIB when skipping install") Signed-off-by: Philippe Guibert <[email protected]>

When installing a route via a nexthop-group that have two identical recursive nexthops, the duplicate flag is not present when protocol nexthop groups are used. > conf t > ip route 4.4.4.0/24 10.0.2.150 > ip route 6.6.6.0/24 10.0.2.150 > nexthop-group A > allow-recursion > nexthop 4.4.4.22 > exit > nexthop-group B > allow-recursion > nexthop 6.6.6.22 > exit > nexthop-group AB > allow-recursion > child-group A > child-group B > exit > exit > sharp install route 12.12.12.12 nexthop-group AB 1 Observed: > ubuntu2204# do show ip route 12.12.12.12 > Routing entry for 12.12.12.12/32 > Known via "sharp", distance 150, metric 0, best > Last update 00:00:50 ago > 4.4.4.22 (recursive), weight 1 > * 10.0.2.150, via mgmt0, weight 1 > 6.6.6.22 (recursive), weight 1 > * 10.0.2.150, via mgmt0, weight 1 Expected: > ubuntu2204# do show ip route 12.12.12.12 > Routing entry for 12.12.12.12/32 > Known via "sharp", distance 150, metric 0, best > Last update 00:00:08 ago > 4.4.4.22 (recursive), weight 1 > * 10.0.2.150, via mgmt0, weight 1 > 6.6.6.22 (recursive), weight 1 > 10.0.2.150, via mgmt0 (duplicate nexthop removed), weight 1 Fix this refreshing the duplicate flag value of the re's nexthop-group. Signed-off-by: Philippe Guibert <[email protected]>

The DUPLICATE flag value of nhg's nexthop is lost when a route is updated. This is the case with sharp daemon: Initial configuration: > configure terminal > ip route 7.7.7.0/24 10.0.2.68 > ip route 6.6.6.0/24 10.0.2.68 > nexthop-group A > allow-recursion > nexthop 7.7.7.88 > nexthop 6.6.6.33 > end > sharp install routes 12.12.12.12 nexthop-group A 1 Observed: > # show ip route 12.12.12.12/32 json > { > "12.12.12.12/32":[ > { > "prefix":"12.12.12.12/32", > [..] > "nexthops":[ > [..] > { > "flags":17, <--- correct > "duplicate":true, > "ip":"10.0.2.68", > "afi":"ipv4", > "interfaceIndex":2, > "interfaceName":"mgmt0", > "resolver":true, > "active":true, > "weight":1 > } Then update the nexthop resolution > configure terminal > ip route 6.6.6.0/24 10.0.2.88 > end > sharp install routes 12.12.12.12 nexthop-group A 1 Observed: > # show ip route 12.12.12.12/32 json > { > "12.12.12.12/32":[ > { > "prefix":"12.12.12.12/32", > [..] > "nexthops":[ > [..] > { > "flags":3, > "fib":true, > "ip":"10.0.2.88", > "afi":"ipv4", > "interfaceIndex":2, > "interfaceName":"mgmt0", > "resolver":true, > "active":true, > "weight":1 > }, > { > [..] > }, > { > "flags":3, <-- expected duplicate should be present > "fib":true, <-- expected fib should be unset > "ip":"10.0.2.68", > "afi":"ipv4", > "interfaceIndex":2, > "interfaceName":"mgmt0", > "resolver":true, > "active":true, > "weight":1 > } > ] > } > ] > } When updating the route, only the nexthop-group has changed. The NHG dataplane result does not have the duplicate flag set, because it has been removed before sending it to the dataplane (see below link). Fix this by updating the duplicate flag on the duplicated nexthops. Also, avoid setting the FIB flag when duplicate is found. Link: 8dbc800 ("zebra: Prevent duplication and overflow in nhe2grp") Signed-off-by: Philippe Guibert <[email protected]>

On the creation of a nexthop-group, check the afi value for all the nexthops of that nexthop-group. Check that the value is the same, cancel the nexthop-group add otherwise. Signed-off-by: Mark Stapp <[email protected]> Signed-off-by: Philippe Guibert <[email protected]>

When a failover happens on ECMP paths that use the same nexthop which is recursively resolved, ZEBRA replaces the old NHG with a new one, and updates the pointer of all routes using that nexthop. Actually, if only the recursive nexthop changed, there is no need to replace the old NHG. Modify the zebra_nhg_proto_add() function, by updating the recursive nexthop on the original NHG. Signed-off-by: Philippe Guibert <[email protected]>

Add a test that checks that the duplicate flag is correctly propagated in nexthop-groups. Signed-off-by: Philippe Guibert <[email protected]>

The list of prefixes that use a given nexthop group is not available, and as consequence, when an event happens on a nexthop group, there is need to parse the whole list of tables, to check which prefix is impacted by the nexthop group. In each nhg_hash_entry, create an hash list that will contain the list of prefixes added. That list is updated at each ROUTE_ADD and ROUTE_DELETE actions, once the dataplane succeeded in the update. The change only populates those hash lists, which will be used in the next commit. Signed-off-by: Philippe Guibert <[email protected]>

When a BGP configuration receives and installs a prefix with multiple nexthops, if a link down event happens, the nexthop resolution is not propagated. The below example illustrates the 10.0.5.1 prefix, where the resolution is given by BGP. > # show ip nht > [..] > 10.0.5.1 > resolved via bgp > via 10.0.7.2, r1-r2-eth2 (vrf default), weight 1 > via 10.0.8.2, r1-r2-eth3 (vrf default), weight 1 > via 10.0.9.2, r1-r2-eth4 (vrf default), weight 1 > via 10.0.10.2, r1-r2-eth5 (vrf default), weight 1 > Client list: pim(fd 45) After the r1-r2-th2 and r1-r2-eth4 interfaces are down, the nht entry for 10.0.5.1 is kept unchanged. The problem is related to nexthop groups. When using an NHG_ADD to refresh the BGP route, then NHT entry is not refreshed, and PIM is not informed about the change. Signed-off-by: Philippe Guibert <[email protected]>

The 'show nexthop-group id' command has a detail parameter. This parameter will help displaying the list of prefixes that protocol daemon is registered with. > r1# show nexthop-group rib bgp detail > ID: 71428582 (bgp) > RefCnt: 2 > Uptime: 00:07:54 > VRF: default(IPv4) > Valid, Installed > Depends: (41) > via 192.0.2.6 (vrf default) (recursive), label 6000, weight 1 > via 172.31.0.3, r1-eth1 (vrf default), label 16006/6000, weight 1 > Prefix list: > 192.0.2.9/32, afi IPv4, safi unicast, table 101 (vrf vrf1) > > r1# show nexthop-group rib bgp detail json > [..] > "71428582":{ > "type":"bgp", > "refCount":2, > "uptime":"00:05:42", > "vrf":"default", > "afi":"IPv4", > "valid":true, > "installed":true, > "depends":[ > 41 > ], > [..] > "prefixList":[ > { > "prefix":"192.0.2.9/32", > "afi":"IPv4", > "safi":"unicast", > "table":101, > "vrfId":7, > "vrfName":"vrf1" > } > ] > }, Signed-off-by: Philippe Guibert <[email protected]>

The following ASAN issue has been observed: > ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840 > READ of size 4 at 0x6160000acba4 thread T0 > #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315 > FRRouting#1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331 > FRRouting#2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680 > FRRouting#3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490 > FRRouting#4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717 > FRRouting#5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413 > FRRouting#6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919 > FRRouting#7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454 > FRRouting#8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822 > FRRouting#9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212 > FRRouting#10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968 > FRRouting#11 0x7f26f275b8a9 in route_node_free lib/table.c:75 > FRRouting#12 0x7f26f275bae4 in route_table_free lib/table.c:111 > FRRouting#13 0x7f26f275b749 in route_table_finish lib/table.c:46 > FRRouting#14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191 > FRRouting#15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244 > FRRouting#16 0x55910c4f40db in zebra_finalize zebra/main.c:249 > FRRouting#17 0x7f26f2777108 in event_call lib/event.c:2011 > FRRouting#18 0x7f26f264180e in frr_run lib/libfrr.c:1212 > FRRouting#19 0x55910c4f49cb in main zebra/main.c:531 > FRRouting#20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 > FRRouting#21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392 > FRRouting#22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114) It happens with FRR using the kernel. During shutdown, the namespace identifier is attempted to be obtained by zebra, in an attempt to prepare zebra dataplane nexthop messages. Fix this by accessing the ns structure. Signed-off-by: Philippe Guibert <[email protected]>

frrbot bot added bgp bugfix documentation libfrr sharp zebra labels May 17, 2024

github-actions bot added size/XXL master labels May 17, 2024

pguibert6WIND force-pushed the zebra_nhg_recursive branch from ec14152 to 5004fba Compare May 17, 2024 10:20

pguibert6WIND mentioned this pull request May 17, 2024

Bgp nexthop group optimisations #15488

Draft

ton31337 reviewed May 28, 2024

View reviewed changes

github-actions bot added the conflicts label May 28, 2024

donaldsharp self-requested a review May 28, 2024 15:21

pguibert6WIND force-pushed the zebra_nhg_recursive branch from 5004fba to 0e4ce6b Compare May 30, 2024 15:16

github-actions bot removed the conflicts label May 30, 2024

pguibert6WIND force-pushed the zebra_nhg_recursive branch 2 times, most recently from 452a014 to 1e64cc6 Compare May 31, 2024 06:35

pguibert6WIND force-pushed the zebra_nhg_recursive branch from 3c117b3 to 29ce92f Compare May 31, 2024 07:17

pguibert6WIND force-pushed the zebra_nhg_recursive branch 2 times, most recently from 9ffb19d to 5f4c2f3 Compare June 5, 2024 13:27

pguibert6WIND and others added 29 commits October 8, 2024 13:55

zebra: rework zebra_nhg_proto_add() parameters

0cab081

This small rework prepares next commit. Signed-off-by: Philippe Guibert <[email protected]>

sharpd: unlock to create unresolved NHGs

b1de672

Add the ability for the sharpd daemon to program nexthop-groups that are unresolved: this includes blackhole routes, but also nexthops that have no interface information. Signed-off-by: Philippe Guibert <[email protected]>

topotests: add nexthop-group allow-recursion test

34b1bee

Add a nexthop group test that ensures that a recursive next-hop is resolved in zebra. Signed-off-by: Philippe Guibert <[email protected]>

topotests: all_protocol_startup, add nexthop-group test for child groups

997e584

Add a test for hierarchical nexthop groups. A nexthop group with child nexthop groups is added and modified. Signed-off-by: Philippe Guibert <[email protected]>

topotests: all_protocol_startup, add duplicate nexthop-group test

70acb81

Add a test that checks that the duplicate flag is correctly propagated in nexthop-groups. Signed-off-by: Philippe Guibert <[email protected]>

pguibert6WIND force-pushed the zebra_nhg_recursive branch from bbecb2b to fd6909b Compare October 8, 2024 11:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ZEBRA NHG recursive handling #16028

Add ZEBRA NHG recursive handling #16028

pguibert6WIND commented May 17, 2024

pguibert6WIND commented May 17, 2024

ton31337 left a comment

github-actions bot commented May 28, 2024

pguibert6WIND commented May 31, 2024

pguibert6WIND commented Jun 3, 2024

Add ZEBRA NHG recursive handling #16028

Are you sure you want to change the base?

Add ZEBRA NHG recursive handling #16028

Conversation

pguibert6WIND commented May 17, 2024

pguibert6WIND commented May 17, 2024

ton31337 left a comment

Choose a reason for hiding this comment

github-actions bot commented May 28, 2024

pguibert6WIND commented May 31, 2024

pguibert6WIND commented Jun 3, 2024