Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zebra: fix table heap-after-free crash #16614

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

louis-6wind
Copy link
Contributor

Fix a heap-after-free that causes zebra to crash even without address-sanitizer. To reproduce:

echo "100 my_table" | tee -a /etc/iproute2/rt_tables
ip route add blackhole default table 100
ip route show table 100
ip l add red type vrf table 100
ip l del red
ip route del blackhole default table 100

==2866266==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000154f54 at pc 0x7fa32474b83f bp 0x7ffe94f67d90 sp 0x7ffe94f67d88
READ of size 1 at 0x606000154f54 thread T0
#0 0x7fa32474b83e in rn_hash_node_const_find lib/table.c:28
#1 0x7fa32474bab1 in rn_hash_node_find lib/table.c:28
#2 0x7fa32474d783 in route_node_get lib/table.c:283
#3 0x7fa3247328dd in srcdest_rnode_get lib/srcdest_table.c:231
#4 0x55b0e4fa8da4 in rib_find_rn_from_ctx zebra/zebra_rib.c:1957
#5 0x55b0e4fa8e31 in rib_process_result zebra/zebra_rib.c:1988
#6 0x55b0e4fb9d64 in rib_process_dplane_results zebra/zebra_rib.c:4894
#7 0x7fa32476689c in event_call lib/event.c:1996
#8 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
#9 0x55b0e4e6c32a in main zebra/main.c:526
#10 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308
#11 0x55b0e4e2d649 in _start (/usr/lib/frr/zebra+0x1a1649)

0x606000154f54 is located 20 bytes inside of 56-byte region [0x606000154f40,0x606000154f78)
freed by thread T0 here:
#0 0x7fa324ca9b6f in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:123
#1 0x7fa324668d8f in qfree lib/memory.c:130
#2 0x7fa32474c421 in route_table_free lib/table.c:126
#3 0x7fa32474bf96 in route_table_finish lib/table.c:46
#4 0x55b0e4fbca3a in zebra_router_free_table zebra/zebra_router.c:191
#5 0x55b0e4fbccea in zebra_router_release_table zebra/zebra_router.c:214
#6 0x55b0e4fd428e in zebra_vrf_disable zebra/zebra_vrf.c:219
#7 0x7fa32476fabf in vrf_disable lib/vrf.c:326
#8 0x7fa32476f5d4 in vrf_delete lib/vrf.c:231
#9 0x55b0e4e4ad36 in interface_vrf_change zebra/interface.c:1478
#10 0x55b0e4e4d5d2 in zebra_if_dplane_ifp_handling zebra/interface.c:1949
#11 0x55b0e4e4fb89 in zebra_if_dplane_result zebra/interface.c:2268
#12 0x55b0e4fb9f26 in rib_process_dplane_results zebra/zebra_rib.c:4954
#13 0x7fa32476689c in event_call lib/event.c:1996
#14 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
#15 0x55b0e4e6c32a in main zebra/main.c:526
#16 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308

previously allocated by thread T0 here:
#0 0x7fa324caa037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x7fa324668c4d in qcalloc lib/memory.c:105
#2 0x7fa32474bf33 in route_table_init_with_delegate lib/table.c:38
#3 0x7fa32474e73c in route_table_init lib/table.c:512
#4 0x55b0e4fbc353 in zebra_router_get_table zebra/zebra_router.c:137
#5 0x55b0e4fd4da0 in zebra_vrf_table_create zebra/zebra_vrf.c:358
#6 0x55b0e4fd3d30 in zebra_vrf_enable zebra/zebra_vrf.c:140
#7 0x7fa32476f9b2 in vrf_enable lib/vrf.c:286
#8 0x55b0e4e4af76 in interface_vrf_change zebra/interface.c:1533
#9 0x55b0e4e4d612 in zebra_if_dplane_ifp_handling zebra/interface.c:1968
#10 0x55b0e4e4fb89 in zebra_if_dplane_result zebra/interface.c:2268
#11 0x55b0e4fb9f26 in rib_process_dplane_results zebra/zebra_rib.c:4954
#12 0x7fa32476689c in event_call lib/event.c:1996
#13 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
#14 0x55b0e4e6c32a in main zebra/main.c:526
#15 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308

Fixes: d8612e6 ("zebra: Track tables allocated by vrf and cleanup")

@louis-6wind
Copy link
Contributor Author

ci:rerun

@louis-6wind louis-6wind marked this pull request as ready for review August 22, 2024 09:42
@louis-6wind louis-6wind force-pushed the fix-otable-heap-after-free branch 2 times, most recently from ac9b780 to 311df79 Compare August 22, 2024 15:22
@github-actions github-actions bot added size/M and removed size/S labels Aug 22, 2024
Copy link
Contributor

@mjstapp mjstapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand what is being fixed here - can you explain what you think is going wrong, and why you think this is the right fix? It seems ... surprising to try to change the vrf ids of route_entries. The crash backtrace looks like there's an array cell or a hash/tree entry that should have been NULL'd.

We should handle the possibility that dataplane results refer to prefixes or to vrfs that are no longer present - but of course we have to be able to detect that condition.

Or, a table-id should only be in one vrf - is that the issue, that we're not handling the migration of the tableid among vrfs, or failing to force that id to share fate with a vrf?

@donaldsharp
Copy link
Member

I actually think it's an ok way to solve the problem. Perhaps though Louis should spend a few minutes expanding the comments in the code, as that I had to spend a bit of time understanding the change

Effectively this code moves the table(and all the pointers from various data structures) into and out of the default vrf or the vrf created, when the non-default vrf is created or deleted. This way the pointers for the vrf always are correct from a data structure point of view and we will not get any type of use after free that was happening.

It does look like to me, though, that there are address sanitizer issues that were found with this approach. At that point I stopped looking at it and was waiting for Louis to get back to it and fix the problems.

@mjstapp
Copy link
Contributor

mjstapp commented Aug 28, 2024

ok - but there are things I don't think I get.
the config example creates a vrf with a table, then the vrf is deleted. our normal practice is that when a vrf is deleted, all the resources in that vrf are deleted. how do we detect the case where that "vrf RED table 100" isn't just being deleted, and decide not to free the RED routes, but move them to some other vrf?

the config example uses table 100, in the default vrf, I guess, then uses it for a vrf. is the correct thing to do to delete everything, and start with an empty table 100 in the new vrf? or is the correct thing to "move" ... things from one vrf to the new vrf? do we have other cases where we just "move stuff" among vrfs?

I actually think it's an ok way to solve the problem. Perhaps though Louis should spend a few minutes expanding the comments in the code, as that I had to spend a bit of time understanding the change

If we want to have a function that says "move this table from vrf A to vrf B", or "move the routes (and ... anything else?) from vrf A to vrf B", then that should be a function. just sticking a loop in that pokes at the cells in an re is ... awkward.

Effectively this code moves the table(and all the pointers from various data structures) into and out of the default vrf or the vrf created, when the non-default vrf is created or deleted. This way the pointers for the vrf always are correct from a data structure point of view and we will not get any type of use after free that was happening.

@mjstapp
Copy link
Contributor

mjstapp commented Aug 28, 2024

Looking a little more, I think the problem is introduced early on: we have a route in vrf default, table 100; we are informed that vrf RED table 100 has been created - and that vrf/zvrf takes over the table, but we just ... leave the route in that table, which has sort of become the property of vrf RED. I think we should fix that - and also decide what to do about cleanup. maybe kernel routes are a special case, that need special handling, since the kernel behavior here is sort of ... distinctive?

@donaldsharp
Copy link
Member

fix it to what? I actually disagree that it's vrf default table 100.
it's just table 100.

@mjstapp
Copy link
Contributor

mjstapp commented Aug 28, 2024

every route (re, rn) has a vrf - and in the case of this config example, that is the default vrf.
and when the config creates the new vrf, those routes ... just sort of float around, in a table that new represents RED, but pointing to default. and there's at least one nhg too, in this case, that's also left in "default".

fix it to what? I actually disagree that it's vrf default table 100. it's just table 100.

@donaldsharp
Copy link
Member

I spent some time talking w/ Mark about this on Friday and I also spent some time testing this fix out. I believe that some of the issues that Mark was seeing was a result of another problem that was unrelated( and fixed recently as well ). I'm pretty happy with this at the moment.

@louis-6wind
Copy link
Contributor Author

Thank you for the time you spent on this @donaldsharp and @mjstapp
I try to take some times to answer to you this week

@donaldsharp
Copy link
Member

During testing this change introduces these new memory leaks:

commit fb23d4e2b364525cfbea7768ac6fe22d69fd1b16
Author: Louis Scalbert <[email protected]>
Date:   Thu Aug 22 11:12:29 2024 +0200

    zebra: fix table heap-after-free crash
    
    Fix a heap-after-free that causes zebra to crash even without
    address-sanitizer. To reproduce:

@donaldsharp
Copy link
Member

so after some more thought I am pretty happy with this, we need to fix the memory leaks introduced as well as I think we need a topotest for this situation.

@donaldsharp
Copy link
Member

2024-09-04 09:41:14,405 WARNING: topo: r1: zebra exited leaving pidfile /var/run/frr/zebra.pid (562470)
r1: zebra has memory leaks:
zebra: 
  ## showing active allocations in memory group libfrr
zebra:     Nexthop                       :      4 *        152
zebra: 
  ## showing active allocations in memory group Native message allocations
zebra: 
  ## showing active allocations in memory group logging subsystem
zebra: 
  ## showing active allocations in memory group Label Manager
zebra: 
  ## showing active allocations in memory group Table Manager
zebra: 
  ## showing active allocations in memory group SRv6 Manager
zebra: 
  ## showing active allocations in memory group zebra
zebra:     Zebra DPlane Ctx              :      4 *       1520

Copy link
Contributor

@mjstapp mjstapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think there's something missing - here's an example:

$ sudo ip route add blackhole default table 111
$ sudo ip link add RED type vrf table 111

and the frr show output:

ubu-24-arm# do sho ip route table all
Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

VRF default table 254:
K>* 0.0.0.0/0 [0/1002] via 10.211.55.2, enp0s5, src 10.211.55.6, weight 1, 00:02:54
C>* 2.2.2.0/24 is directly connected, ANNIE, weight 1, 00:02:54
L>* 2.2.2.1/32 is directly connected, ANNIE, weight 1, 00:02:54
C>* 3.3.3.0/24 is directly connected, BETTY, weight 1, 00:02:54
L>* 3.3.3.1/32 is directly connected, BETTY, weight 1, 00:02:54
K * 10.211.55.0/24 [0/1002] is directly connected, enp0s5, weight 1, 00:02:54
C>* 10.211.55.0/24 is directly connected, enp0s5, weight 1, 00:02:54
K>* 10.211.55.2/32 [0/100] is directly connected, enp0s5, weight 1, 00:02:54
L>* 10.211.55.6/32 is directly connected, enp0s5, weight 1, 00:02:54

ubu-24-arm# 
ubu-24-arm# do sho ip route table 111
Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

VRF default table 111:
K>* 0.0.0.0/0 [0/0] unreachable (blackhole) (vrf default), weight 1, 00:00:39

ubu-24-arm# do sho ip route vrf RED
Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

VRF RED:
K>* 0.0.0.0/0 [0/0] unreachable (blackhole) (vrf default), weight 1, 00:00:53

ubu-24-arm#

@@ -216,8 +220,21 @@ static int zebra_vrf_disable(struct vrf *vrf)
* we no-longer need this pointer.
*/
for (safi = SAFI_UNICAST; safi <= SAFI_MULTICAST; safi++) {
zebra_router_release_table(zvrf, zvrf->table_id, afi,
safi);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear about removing this: if there's an empty table, wouldn't it be ok to call 'release' on it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now only remove it it is empty

@louis-6wind
Copy link
Contributor Author

I still think there's something missing - here's an example:

$ sudo ip route add blackhole default table 111
$ sudo ip link add RED type vrf table 111

and the frr show output:

ubu-24-arm# do sho ip route table all
Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

VRF default table 254:
K>* 0.0.0.0/0 [0/1002] via 10.211.55.2, enp0s5, src 10.211.55.6, weight 1, 00:02:54
C>* 2.2.2.0/24 is directly connected, ANNIE, weight 1, 00:02:54
L>* 2.2.2.1/32 is directly connected, ANNIE, weight 1, 00:02:54
C>* 3.3.3.0/24 is directly connected, BETTY, weight 1, 00:02:54
L>* 3.3.3.1/32 is directly connected, BETTY, weight 1, 00:02:54
K * 10.211.55.0/24 [0/1002] is directly connected, enp0s5, weight 1, 00:02:54
C>* 10.211.55.0/24 is directly connected, enp0s5, weight 1, 00:02:54
K>* 10.211.55.2/32 [0/100] is directly connected, enp0s5, weight 1, 00:02:54
L>* 10.211.55.6/32 is directly connected, enp0s5, weight 1, 00:02:54

ubu-24-arm# 
ubu-24-arm# do sho ip route table 111
Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

VRF default table 111:
K>* 0.0.0.0/0 [0/0] unreachable (blackhole) (vrf default), weight 1, 00:00:39

ubu-24-arm# do sho ip route vrf RED
Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

VRF RED:
K>* 0.0.0.0/0 [0/0] unreachable (blackhole) (vrf default), weight 1, 00:00:53

ubu-24-arm#

You mean that the following ouput was incorrect, right ?

VRF default table 111:

I now display:

Table 111:

@louis-6wind
Copy link
Contributor Author

louis-6wind commented Sep 20, 2024

@donaldsharp @mjstapp I have added more explanations in the commit logs and added more comments in the code.

The path leading to the heap-after-free issue is somewhat complex and might not be worth detailing in full. The root cause is the table and route entries VRF mismatch.

I still need to:

  • check the memory leaks
  • build a topotest
  • fix the issue where the blackhole route disappears upon the first FRR route creation.

@louis-6wind louis-6wind force-pushed the fix-otable-heap-after-free branch 2 times, most recently from b9a5d5b to b0ea75e Compare September 24, 2024 09:37
@frrbot frrbot bot added the tests Topotests, make check, etc label Sep 24, 2024
@louis-6wind louis-6wind force-pushed the fix-otable-heap-after-free branch 5 times, most recently from de9161f to 48fcd48 Compare September 26, 2024 07:35
Copy link

github-actions bot commented Oct 5, 2024

This pull request has conflicts, please resolve those before we can evaluate the pull request.

louis-6wind and others added 10 commits October 9, 2024 18:14
Fix a heap-after-free that causes zebra to crash even without
address-sanitizer. To reproduce:

> echo "100 my_table" | tee -a /etc/iproute2/rt_tables
> ip route add blackhole default table 100
> ip route show table 100
> ip l add red type vrf table 100
> ip l del red
> ip route del blackhole default table 100

Zebra manages routing tables for all existing Linux RT tables,
regardless of whether they are assigned to a VRF interface. When a table
is not assigned to any VRF, zebra arbitrarily assigns it to the default
VRF, even though this is not strictly accurate (the code expects this
behavior).

When an RT table is created after a VRF, zebra correctly assigns the
table to the VRF. However, if a VRF interface is assigned to an existing
RT table, zebra does not update the table owner, which remains as the
default VRF. As a result, existing routing entries remain under the
default VRF, while new entries are correctly assigned to the VRF. The
VRF mismatch is unexpected in the code and creates crashes and memory
related issues.

Furthermore, Linux does not automatically delete RT tables when they are
unassigned from a VRF. It is incorrect to delete these tables from zebra.

Instead, at VRF disabling, do not release the table but reassign it to
the default VRF. At VRF enabling, change the table owner back to the
appropriate VRF.

> ==2866266==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000154f54 at pc 0x7fa32474b83f bp 0x7ffe94f67d90 sp 0x7ffe94f67d88
> READ of size 1 at 0x606000154f54 thread T0
>     #0 0x7fa32474b83e in rn_hash_node_const_find lib/table.c:28
>     #1 0x7fa32474bab1 in rn_hash_node_find lib/table.c:28
>     #2 0x7fa32474d783 in route_node_get lib/table.c:283
>     #3 0x7fa3247328dd in srcdest_rnode_get lib/srcdest_table.c:231
>     FRRouting#4 0x55b0e4fa8da4 in rib_find_rn_from_ctx zebra/zebra_rib.c:1957
>     FRRouting#5 0x55b0e4fa8e31 in rib_process_result zebra/zebra_rib.c:1988
>     FRRouting#6 0x55b0e4fb9d64 in rib_process_dplane_results zebra/zebra_rib.c:4894
>     FRRouting#7 0x7fa32476689c in event_call lib/event.c:1996
>     FRRouting#8 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
>     FRRouting#9 0x55b0e4e6c32a in main zebra/main.c:526
>     FRRouting#10 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308
>     FRRouting#11 0x55b0e4e2d649 in _start (/usr/lib/frr/zebra+0x1a1649)
>
> 0x606000154f54 is located 20 bytes inside of 56-byte region [0x606000154f40,0x606000154f78)
> freed by thread T0 here:
>     #0 0x7fa324ca9b6f in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:123
>     #1 0x7fa324668d8f in qfree lib/memory.c:130
>     #2 0x7fa32474c421 in route_table_free lib/table.c:126
>     #3 0x7fa32474bf96 in route_table_finish lib/table.c:46
>     FRRouting#4 0x55b0e4fbca3a in zebra_router_free_table zebra/zebra_router.c:191
>     FRRouting#5 0x55b0e4fbccea in zebra_router_release_table zebra/zebra_router.c:214
>     FRRouting#6 0x55b0e4fd428e in zebra_vrf_disable zebra/zebra_vrf.c:219
>     FRRouting#7 0x7fa32476fabf in vrf_disable lib/vrf.c:326
>     FRRouting#8 0x7fa32476f5d4 in vrf_delete lib/vrf.c:231
>     FRRouting#9 0x55b0e4e4ad36 in interface_vrf_change zebra/interface.c:1478
>     FRRouting#10 0x55b0e4e4d5d2 in zebra_if_dplane_ifp_handling zebra/interface.c:1949
>     FRRouting#11 0x55b0e4e4fb89 in zebra_if_dplane_result zebra/interface.c:2268
>     FRRouting#12 0x55b0e4fb9f26 in rib_process_dplane_results zebra/zebra_rib.c:4954
>     FRRouting#13 0x7fa32476689c in event_call lib/event.c:1996
>     FRRouting#14 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
>     FRRouting#15 0x55b0e4e6c32a in main zebra/main.c:526
>     FRRouting#16 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308
>
> previously allocated by thread T0 here:
>     #0 0x7fa324caa037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
>     #1 0x7fa324668c4d in qcalloc lib/memory.c:105
>     #2 0x7fa32474bf33 in route_table_init_with_delegate lib/table.c:38
>     #3 0x7fa32474e73c in route_table_init lib/table.c:512
>     FRRouting#4 0x55b0e4fbc353 in zebra_router_get_table zebra/zebra_router.c:137
>     FRRouting#5 0x55b0e4fd4da0 in zebra_vrf_table_create zebra/zebra_vrf.c:358
>     FRRouting#6 0x55b0e4fd3d30 in zebra_vrf_enable zebra/zebra_vrf.c:140
>     FRRouting#7 0x7fa32476f9b2 in vrf_enable lib/vrf.c:286
>     FRRouting#8 0x55b0e4e4af76 in interface_vrf_change zebra/interface.c:1533
>     FRRouting#9 0x55b0e4e4d612 in zebra_if_dplane_ifp_handling zebra/interface.c:1968
>     FRRouting#10 0x55b0e4e4fb89 in zebra_if_dplane_result zebra/interface.c:2268
>     FRRouting#11 0x55b0e4fb9f26 in rib_process_dplane_results zebra/zebra_rib.c:4954
>     FRRouting#12 0x7fa32476689c in event_call lib/event.c:1996
>     FRRouting#13 0x7fa32463b7b2 in frr_run lib/libfrr.c:1232
>     FRRouting#14 0x55b0e4e6c32a in main zebra/main.c:526
>     FRRouting#15 0x7fa32424fd09 in __libc_start_main ../csu/libc-start.c:308

Fixes: d8612e6 ("zebra: Track tables allocated by vrf and cleanup")
Signed-off-by: Louis Scalbert <[email protected]>
When a VRF is deleted, the kernel retains only its own routing entries
in the former VRF table and removes all others.q

This change ensures that routing entries created by FRR daemons are also
removed from the former zebra VRF table when the VRF is disabled.

To test:

> echo "100 my_table" | tee -a /etc/iproute2/rt_tables
> ip l add du0 type dummy
> ifconfig du0 192.168.0.1/24 up
> ip route add blackhole default table 100
> ip route show table 100
> ip l add red type vrf table 100
> ip l set du0 master red
> vtysh -c 'configure' -c 'vrf red' -c 'ip route 10.0.0.0/24 192.168.0.254'
> vtysh -c 'show ip route table 100'
> sleep 0.1
> ip l del red
> sleep 0.1
> vtysh -c 'show ip route table 100'
> ip l add red type vrf table 100
> ip l set du0 master red
> vtysh -c 'configure' -c 'vrf red' -c 'ip route 10.0.0.0/24 192.168.0.254'
> vtysh -c 'show ip route table 100'
> sleep 0.1
> ip l del red
> sleep 0.1
> vtysh -c 'show ip route table 100'

Fixes: d8612e6 ("zebra: Track tables allocated by vrf and cleanup")
Signed-off-by: Louis Scalbert <[email protected]>
When a routing table (RT) already has a default route before being
assigned to a VRF, the default route vanishes in zebra after the VRF
assignment.

> root@router:~# ip route add blackhole default table 100
> root@router:~# ip route show table 100
> blackhole default
> root@router:~# vtysh -c 'show ip route table 100'
> [...]
> VRF default table 100:
> K>* 0.0.0.0/0 [0/0] unreachable (blackhole), 00:00:05
> root@router:~# ip l add red type vrf table 100
> root@router:~# vtysh -c 'show ip route table 100'
> root@router:~#

Do not override the default route if it exists.

Signed-off-by: Louis Scalbert <[email protected]>
Fix vanished blackhole route when kernel routes are updated.

> root@router# echo "100 my_table" | tee -a /etc/iproute2/rt_tables
> root@router# ip l add du0 type dummy
> root@router# ifconfig du0 192.168.0.1/24 up
> root@router# ip route add blackhole default table 100
> root@router# ip route show table 100
> blackhole default
> root@router# vtysh -c 'show ip route table 100'
> [...]
> Table 100:
> K>* 0.0.0.0/0 [0/0] unreachable (blackhole), weight 1, 00:00:05
> root@router# ip l add red type vrf table 100
> root@router# vtysh -c 'show ip route table 100'
> [...]
> Table 100:
> K>* 0.0.0.0/0 [0/0] unreachable (blackhole), weight 1, 00:00:16
> root@router# ip l set du0 master red
> root@router# vtysh -c 'show ip route table 100'
> [...]
> Table 100:
> C>* 192.168.0.0/24 is directly connected, du0, weight 1, 00:00:02
> L>* 192.168.0.1/32 is directly connected, du0, weight 1, 00:00:02
> root@router# ip route show table 100
> blackhole default
> 192.168.0.0/24 dev du0 proto kernel scope link src 192.168.0.1
> local 192.168.0.1 dev du0 proto kernel scope host src 192.168.0.1
> broadcast 192.168.0.255 dev du0 proto kernel scope link src 192.168.0.1

Fixes: d528c02 ("zebra: Handle kernel routes appropriately")
Signed-off-by: Louis Scalbert <[email protected]>
A table ID != 254 is not assigned to VRF default but "show ip route
table XX" shows that is owned by VRF default.

> ip route add blackhole default table 111
> ip link add RED type vrf table 111

> ubu-24-arm# do sho ip route vrf RED
> [...]
> VRF RED:
> K>* 0.0.0.0/0 [0/0] unreachable (blackhole) (vrf default), weight 1, 00:00:53
>
> ubu-24-arm# do sho ip route table 111
> [...]
> VRF default table 111:
> K>* 0.0.0.0/0 [0/0] unreachable (blackhole) (vrf default), weight 1, 00:00:39

Fix the output:

> ubu-24-arm# do sho ip route table all
> [...]
> Table 111:
> K>* 0.0.0.0/0 [0/0] unreachable (blackhole) (vrf default), weight 1, 00:00:39
>
> Table 254:
> [...]

Signed-off-by: Louis Scalbert <[email protected]>
Test table ID move to a VRF and the removal of the VRF.

Signed-off-by: Louis Scalbert <[email protected]>
Add a flag to mean that a route entry is owned by a table ID and not by
a VRF. If the VRF associated to the table ID is deleted, the route
entry must not be deleted.

Signed-off-by: Louis Scalbert <[email protected]>
At VRF disabling, keep the route entries that was associated to its
table ID but not to the VRF itself. Kernel flushes these entries so we
need to reinstall them.

Signed-off-by: Louis Scalbert <[email protected]>
Test table ID move to a VRF and the removal of the VRF.

Signed-off-by: Louis Scalbert <[email protected]>
The following ASAN issue has been observed:

> ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840
> READ of size 4 at 0x6160000acba4 thread T0
>         #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315
>     #1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331
>     #2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680
>     #3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490
>     FRRouting#4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717
>     FRRouting#5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413
>     FRRouting#6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919
>     FRRouting#7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454
>     FRRouting#8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822
>     FRRouting#9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212
>     FRRouting#10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968
>     FRRouting#11 0x7f26f275b8a9 in route_node_free lib/table.c:75
>     FRRouting#12 0x7f26f275bae4 in route_table_free lib/table.c:111
>     FRRouting#13 0x7f26f275b749 in route_table_finish lib/table.c:46
>     FRRouting#14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191
>     FRRouting#15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244
>     FRRouting#16 0x55910c4f40db in zebra_finalize zebra/main.c:249
>     FRRouting#17 0x7f26f2777108 in event_call lib/event.c:2011
>     FRRouting#18 0x7f26f264180e in frr_run lib/libfrr.c:1212
>     FRRouting#19 0x55910c4f49cb in main zebra/main.c:531
>     FRRouting#20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     FRRouting#21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     FRRouting#22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114)

It happens with FRR using the kernel. During shutdown, the
namespace identifier is attempted to be obtained by zebra, in an
attempt to prepare zebra dataplane nexthop messages.

Fix this by accessing the ns structure.

Signed-off-by: Philippe Guibert <[email protected]>
@frrbot frrbot bot added the libfrr label Oct 9, 2024
@louis-6wind louis-6wind marked this pull request as draft October 9, 2024 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants