Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix up from a bunch of ubsan issues found. #16074

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

donaldsharp
Copy link
Member

@donaldsharp donaldsharp commented May 23, 2024

See individual commits

Fixes: #16755

return ret;
if (module_iter->compiled->rpcs) {
LY_LIST_FOR (&module_iter->compiled->rpcs->node, snode) {
ret = yang_snodes_iterate_subtree(snode, module,
Copy link
Contributor

@choppsv1 choppsv1 May 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which error is returned for this? Basically this idiom is the C version of inherited types. That is, &rpcs->node where node is an embedded structure (struct lysc_node) at the start of struct lysc_node action. This allows the elimination of unsafe casting (struct lysc_node *)rpc and instead using type-safe checking that an struct lysc_action_node * is also a struct lysc_node * so typeof(&rpc->node) == struct lysc_node *) so &rpc->node requires no cast when trying to use as an struct lysc_node *.

In any case what &foo->bar translates to is the following:

struct foo *foo = 0;
char *result;

result = (char *)foo; /* so result is 0 */
result += offsetof(struct foo, bar);

So there is no NULL ptr dereference.

In the concrete example above when rpcs is NULL, &rpcs->node is also NULL.

unsigned char ndata[peer->max_packet_size];
uint32_t max_packet_size =
atomic_load_explicit(&peer->max_packet_size,
memory_order_relaxed);
Copy link
Contributor

@choppsv1 choppsv1 May 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think this may mess with the data cache and so could be rather expensive as a result. Can we instead simply guarantee that peer's are allocated with the required alignment to guarantee ((uint32_t)&peer->max_packet_size) % 4) == 0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm max_packet_size is actually a uin16_t. This is probably a threading error I guess, and not an alignment one. Can you try changing max_packet_size to int rather than using the atomic function and see if that fixes the error?

Copy link
Member

@riw777 riw777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good ... waiting on comments from @choppsv1

Sanitizers are finding:

isisd/isis_spf.c:2122:22: runtime error: index 2 out of bounds for type '_uint64_t [2]'

Comparing the pattern against the rest of the code, 1 should be
subtracted.

Signed-off-by: Donald Sharp <[email protected]>
In fact there are more than several places
that we do in fact pass in NULL to get a NULL
returned.

Signed-off-by: Donald Sharp <[email protected]>
The call into zlog_target_clone was passing
&zcf->active->zt.  From running memory sanitizer
we are seeing that it is complaining about
member access within null pointer of type...
for this value.  Since the call into zlog_target_clone
checks for NULL for this value, let's just make sure
we don't do anything stupid here.

Signed-off-by: Donald Sharp <[email protected]>
lib/yang.c:248:3: runtime error: member access within null pointer of type 'struct lysc_node_action'
lib/yang.c:254:3: runtime error: member access within null pointer of type 'struct lysc_node_notif'

If this data structure happens to ever be moved around we'll start
crashing.  Let's just fix it.

Signed-off-by: Donald Sharp <[email protected]>
Running the mgmt_oper tests under heavy load
would occassionally cause the test to fail.
The retry mechanism would kick in and the
test would have succeeded.  Let's extend
the timer.

Signed-off-by: Donald Sharp <[email protected]>
Getting these messages:
./bgp_lu_topo2.test_bgp_lu2/R1/bgpd.err:bgpd/bgp_labelpool.c:310:3: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
./bgp_lu_topo2.test_bgp_lu2/R4/bgpd.err:bgpd/bgp_labelpool.c:310:3: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
./bgp_lu_topo2.test_bgp_lu2/R4/bgpd.err:bgpd/bgp_labelpool.c:497:5: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
./bgp_lu_topo2.test_bgp_lu2/R4/bgpd.err:bgpd/bgp_labelpool.c:499:5: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'

Signed-off-by: Donald Sharp <[email protected]>
Found this problem:
./isis_te_topo1.test_isis_te_topo1/r1/isisd.err:lib/link_state.c:330:8: runtime error: left shift of 1073741824 by 1 places cannot be represented in type 'int'

Clean it up.

Signed-off-by: Donald Sharp <[email protected]>
Found this problem:
./eigrp_topo1.test_eigrp_topo1/r3/eigrpd.err:eigrpd/eigrp_packet.c:1113:35: runtime error: left shift of 192 by 24 places cannot be represented in type 'int'

Fix it.

Signed-off-by: Donald Sharp <[email protected]>
Error message:
./multicast_pim_dr_nondr_test.test_pim_dr_nondr_with_ospf_topo2/r5/zebra.err:zebra/rt_netlink.c:1142:15: runtime error: load of misaligned address 0x7bb40000a064 for type 'long long unsigned int', which requires 8 byte alignment

Fix it.

Signed-off-by: Donald Sharp <[email protected]>
This value is being set and read at the same time according
to the thread sanitizer
WARNING: ThreadSanitizer: data race (pid=2914253)
  Read of size 2 at 0x7ba800011b10 by thread T2:
    #0 validate_header bgpd/bgp_io.c:601 (bgpd+0x60c5e0)
    #1 read_ibuf_work bgpd/bgp_io.c:177 (bgpd+0x608ffe)
    #2 bgp_process_reads bgpd/bgp_io.c:261 (bgpd+0x609880)
    #3 event_call lib/event.c:2011 (libfrr.so.0+0x59168d)
    #4 fpt_run lib/frr_pthread.c:369 (libfrr.so.0+0x35154e)
    #5 frr_pthread_inner lib/frr_pthread.c:178 (libfrr.so.0+0x34fef6)

  Previous write of size 2 at 0x7ba800011b10 by main thread:
    #0 bgp_open_option_parse bgpd/bgp_open.c:1469 (bgpd+0xb5006f)
    #1 bgp_open_receive bgpd/bgp_packet.c:2100 (bgpd+0x6b3f5c)
    #2 bgp_process_packet bgpd/bgp_packet.c:4019 (bgpd+0x6c9549)
    #3 event_call lib/event.c:2011 (libfrr.so.0+0x59168d)
    #4 frr_run lib/libfrr.c:1217 (libfrr.so.0+0x3b04a9)
    #5 main bgpd/bgp_main.c:548 (bgpd+0x49aa3d)

  Location is heap block of size 24328 at 0x7ba80000c000 allocated by main thread:
    #0 calloc ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:667 (libtsan.so.2+0x3fdd2)
    #1 qcalloc lib/memory.c:105 (libfrr.so.0+0x3f2784)
    #2 peer_new bgpd/bgpd.c:1517 (bgpd+0x955024)
    #3 peer_create bgpd/bgpd.c:1941 (bgpd+0x95c908)
    #4 peer_remote_as bgpd/bgpd.c:2211 (bgpd+0x9614a6)
    #5 peer_remote_as_vty bgpd/bgp_vty.c:4788 (bgpd+0x881239)
    #6 neighbor_remote_as bgpd/bgp_vty.c:4869 (bgpd+0x881a28)
    #7 cmd_execute_command_real lib/command.c:1002 (libfrr.so.0+0x2b53a2)
    #8 cmd_execute_command_strict lib/command.c:1111 (libfrr.so.0+0x2b5e0b)
    #9 command_config_read_one_line lib/command.c:1271 (libfrr.so.0+0x2b6972)
    #10 config_from_file lib/command.c:1324 (libfrr.so.0+0x2b7035)
    #11 vty_read_file lib/vty.c:2607 (libfrr.so.0+0x5c0d19)
    #12 vty_read_config lib/vty.c:2853 (libfrr.so.0+0x5c1f37)
    #13 frr_config_read_in lib/libfrr.c:981 (libfrr.so.0+0x3ae76a)
    #14 event_call lib/event.c:2011 (libfrr.so.0+0x59168d)
    #15 frr_run lib/libfrr.c:1217 (libfrr.so.0+0x3b04a9)
    #16 main bgpd/bgp_main.c:548 (bgpd+0x49aa3d)

  Thread T2 'bgpd_io' (tid=2914257, running) created by main thread at:
    #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1001 (libtsan.so.2+0x63a59)
    #1 frr_pthread_run lib/frr_pthread.c:197 (libfrr.so.0+0x3500da)
    #2 bgp_pthreads_run bgpd/bgpd.c:8490 (bgpd+0x9d7716)
    #3 main bgpd/bgp_main.c:547 (bgpd+0x49a9c8)

Fix this.

Signed-off-by: Donald Sharp <[email protected]>
The call to set thread is being used and set at the
same time in various pthreads in the code.  This
should not be happening.  Let's fix it.

Signed-off-by: Donald Sharp <[email protected]>
Copy link
Member

@ton31337 ton31337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to fix styling issues?

@MPFuzz
Copy link

MPFuzz commented Sep 7, 2024

Hoping this pr is being merged soon.

@MPFuzz
Copy link

MPFuzz commented Sep 13, 2024

Hi, when will this pr be merged? Thanks

@riw777
Copy link
Member

riw777 commented Sep 24, 2024

looks like there are lint errors and some comments to address yet?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BGPd: member access within null pointer of type 'struct lysc_node_action'
5 participants