Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid subtract with overflow when growing the stack on musl libc #51

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

jcnelson
Copy link

@jcnelson jcnelson commented Mar 5, 2021

Hello,

I have noticed on Alpine Linux 3.13.2 with musl libc 1.2.2-r0 that I can reliably trigger a runtime panic in this crate when growing the stack while deserializing a JSON structure with serde_stacker. Relevant backtrace snippet is as follows:

thread 'main' panicked at 'attempt to subtract with overflow', /home/jude/.cargo/registry/src/github.com-1ecc6299db9ec823/stacker-0.1.13/src/lib.rs:92:35
stack backtrace:                                                           
   0: rust_begin_unwind                                                                                                                                        
   1: core::panicking::panic_fmt   
   2: core::panicking::panic                                                                                                                                   
   3: stacker::remaining_stack::{{closure}}                                                                                                                    
             at /home/jude/.cargo/registry/src/github.com-1ecc6299db9ec823/stacker-0.1.13/src/lib.rs:92        
   4: core::option::Option<T>::map                                                                                                                             
             at /home/buildozer/aports/community/rust/src/rustc-1.47.0-src/library/core/src/option.rs:437
   5: stacker::remaining_stack                                                                                                                                 
             at /home/jude/.cargo/registry/src/github.com-1ecc6299db9ec823/stacker-0.1.13/src/lib.rs:92
   6: stacker::maybe_grow                                                                                                                                      
             at /home/jude/.cargo/registry/src/github.com-1ecc6299db9ec823/stacker-0.1.13/src/lib.rs:50
   7: <serde_stacker::de::Visitor<V> as serde::de::Visitor>::visit_map                                                                                         
             at /home/jude/pkg/serde-stacker/src/de.rs:511                                                                                                     
   8: <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_struct
             at /home/jude/.cargo/registry/src/github.com-1ecc6299db9ec823/serde_json-1.0.56/src/de.rs:1817
   9: <serde_stacker::de::Deserializer<D> as serde::de::Deserializer>::deserialize_struct
             at /home/jude/pkg/serde-stacker/src/de.rs:278
             
/* snip */

This PR updates the remaining_stack() method to simply return None in the event that current_stack_ptr() < get_stack_limit().

All of the included tests pass locally for me, and the panic no longer manifests with my patched copy of stacker, but I'm not sure if this is the right way to solve this problem. As such, please feel free to treat this PR as informational.

@jcnelson jcnelson changed the title Avoid integer underflow when growing the stack on musl libc Avoid subtract with overflow when growing the stack on musl libc Mar 5, 2021
@nagisa
Copy link
Member

nagisa commented Mar 5, 2021

Hm, this calculation wrapping would suggest that the stack has already overflowed, and in turn that:

a) the stack, for some reason, hasn't guard pages (or they aren't being hit);
b) redzone is set too low when stacker::maybe_grow is called.

I… think. Making this change doesn't necessarily look too wrong to me, but I feel that returning a None on overflow here is just masking a more serious issue elsewhere.

@jcnelson
Copy link
Author

jcnelson commented Mar 5, 2021

Making this change doesn't necessarily look too wrong to me, but I feel that returning a None on overflow here is just masking a more serious issue elsewhere.

This is my concern as well. Can I provide anything else that might be useful from the program execution to help determine this? For example, would find a valgrind trace useful? I can confirm for you that this problem does not manifest with GNU libc (tested on Debian testing).

@nagisa
Copy link
Member

nagisa commented Mar 5, 2021

Most helpful would probably be printing out the values for current_ptr and stack_limit, either via a debugger or printing it out in the code I think.

@jirutka
Copy link

jirutka commented Jul 15, 2023

I’m experiencing this bug when building deno on Alpine Linux.

@nagisa
Copy link
Member

nagisa commented Jul 16, 2023

@jirutka anyway you could attach a debugger to the failing binary and poke around what are the values of limit and current_ptr at the time of failure?

@jirutka
Copy link

jirutka commented Jul 18, 2023

(usize) limit = 140737488117760
(usize) current_ptr = 140737488082160

current_ptr - limit = -35600

@jirutka
Copy link

jirutka commented Jul 18, 2023

I’ve tried to build it with the fallback variant of guess_os_stack_limit() and it doesn’t segfault anymore.

@nagisa
Copy link
Member

nagisa commented Jul 18, 2023

Yeah, disabling guess_os_stack_limit on musl seems like a reasonable change to me. It is unfortunate that doing so is necessary, but unless somebody is able to debug why it doesn't work right, people will have to live with a little less performance on the very first recursion through stacker.

@12101111
Copy link

build.rs of deno_runtime allocated ~4MB struct on stack before the invocation of maybe_grow , which is far beyond the guard page.

So this is not an issue of stacker, the stack has already overflowed in the recursive code from
https://github.com/swc-project/swc/blob/ce7f4b693d52b4c94224c14318dc397ada1797ed/crates/swc_ecma_parser/src/parser/stmt.rs#L512C21-L512C21

My workaround is creating a new thread with 8MB stack.

https://github.com/12101111/overlay/blob/master/dev-lang/deno/files/stacker.patch
https://github.com/12101111/overlay/blob/master/dev-lang/deno/files/fix-stackoverflow.patch

@nagisa
Copy link
Member

nagisa commented Jul 19, 2023

Is it known why the guard page was not hit on function entry? Is it that stack probes do not exist/work on musl? Or is it that musl systems don’t guard their stack with a guard page?

@nekopsykose
Copy link

@richfelker do you know why this would happen?

@richfelker
Copy link

I would guess there's code that's assuming the value returned by guess_os_stack_limit is a hard limit that can never be exceeded, rather than the current stack size. The way the "linux" case in that function works, via pthread_getattr_np, musl will return a value that is a promise you can use up to the returned value without crashing from stack overflow, rather than a promise that if you use more than the returned value, you will crash with stack overflow.

Note that the value glibc is returning is not reliable; it's possible to crash before that (under memory pressure) or that it's inaccurate due to change in rlimit after process image start. For example, if stack rlimit was 1M at exec time, but was later increased to 8M before calling pthread_getattr_np, I believe glibc will tell you you have 8M available, but the kernel setup the memory mappings such that only 1M might be available.

I'm not sure how to best fix this. It's a messy area where exiting practice was rather broken,

@jirutka
Copy link

jirutka commented Jul 20, 2023

https://github.com/12101111/overlay/blob/master/dev-lang/deno/files/stacker.patch
https://github.com/12101111/overlay/blob/master/dev-lang/deno/files/fix-stackoverflow.patch

@12101111, I used your patches (thanks a lot for your insight!), the trick with the runtime build script helped and most of the tests passed. However, (only) the following tests (out of 1289) panicked on overflow from stacker:

  • integration::node_unit_tests::_fs_chmod_test
  • integration::node_unit_tests::_fs_read_test
  • integration::node_unit_tests::_fs_readlink_test
  • integration::node_unit_tests::_fs_stat_test
  • integration::npm::esm_module
  • integration::run::_045_proxy

Then I rebuilt it again with your patches and my patch disabling guess_os_stack_limit, now these tests passed. I’m really curious why these particular tests hit the bug and not others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants