Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] selfbuilt datadog agent 7.57.2 segfaults on disk check #30149

Open
xnox opened this issue Oct 15, 2024 · 2 comments
Open

[BUG] selfbuilt datadog agent 7.57.2 segfaults on disk check #30149

xnox opened this issue Oct 15, 2024 · 2 comments

Comments

@xnox
Copy link

xnox commented Oct 15, 2024

Agent Environment

# agent version
Agent 7.57.2 - Commit: 38ba0c7 - Serialization version: v5.0.130 - Go version: go1.23.2

Describe what happened:
agent disk check

# agent check disk
SIGSEGV: segmentation violation
PC=0x777bb940e736 m=15 sigcode=1 addr=0x0
signal arrived during cgo execution

go-traceback.txt

Describe what you expected:
successful

Steps to reproduce the issue:
Just run agent disk check from a self-built agent.... note 7.56.2 works fine.

Additional environment details (Operating System, Cloud provider, etc):
Docker container on Ubuntu linux.

@xnox
Copy link
Author

xnox commented Oct 15, 2024

Oh, i think that masive traceback is trying to say


85    futex(0xc000180848, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
80    <... newfstatat resumed>0xc00111b898, 0) = -1 ENOENT (No such file or directory)
80    newfstatat(AT_FDCWD, "/sbin/blkid", 0xc00111b968, 0) = -1 ENOENT (No such file or directory)
80    newfstatat(AT_FDCWD, "/bin/blkid",  <unfinished ...>
79    <... tgkill resumed>)             = 0
80    <... newfstatat resumed>0xc00111ba38, 0) = -1 ENOENT (No such file or directory)
79    nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>

that blkid is not available. A better captured error message would be nice here.

@sgnn7
Copy link
Contributor

sgnn7 commented Oct 16, 2024

Hi @xnox,
It's hard to tell from the attached log as to the root issue but it definitely seems (as you found) it's somewhere in Python checks executions. Due to the fact that Python is loaded through C APIs by Go, the error messages can get lost between the two runtimes without the right flags and tooling.

If your strace output is accurate, you have already found the problem with your build and you can find the relevant code here. If you're interested in deeper C-level debugging, I'd recommend using the gdb troubleshooting steps. It is a bit unexpected that you're not getting the Python error output from the try/catch though but there's not enough info in the bug report to indicate why and/or if this is connected somehow to how you're building the Agent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants