[BUG] selfbuilt datadog agent 7.57.2 segfaults on disk check #30149

xnox · 2024-10-15T17:53:27Z

Agent Environment

# agent version
Agent 7.57.2 - Commit: 38ba0c7 - Serialization version: v5.0.130 - Go version: go1.23.2

Describe what happened:
agent disk check

# agent check disk
SIGSEGV: segmentation violation
PC=0x777bb940e736 m=15 sigcode=1 addr=0x0
signal arrived during cgo execution

go-traceback.txt

Describe what you expected:
successful

Steps to reproduce the issue:
Just run agent disk check from a self-built agent.... note 7.56.2 works fine.

Additional environment details (Operating System, Cloud provider, etc):
Docker container on Ubuntu linux.

The text was updated successfully, but these errors were encountered:

xnox · 2024-10-15T18:19:30Z

Oh, i think that masive traceback is trying to say


85    futex(0xc000180848, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
80    <... newfstatat resumed>0xc00111b898, 0) = -1 ENOENT (No such file or directory)
80    newfstatat(AT_FDCWD, "/sbin/blkid", 0xc00111b968, 0) = -1 ENOENT (No such file or directory)
80    newfstatat(AT_FDCWD, "/bin/blkid",  <unfinished ...>
79    <... tgkill resumed>)             = 0
80    <... newfstatat resumed>0xc00111ba38, 0) = -1 ENOENT (No such file or directory)
79    nanosleep({tv_sec=0, tv_nsec=20000},  <unfinished ...>

that blkid is not available. A better captured error message would be nice here.

sgnn7 · 2024-10-16T20:38:56Z

Hi @xnox,
It's hard to tell from the attached log as to the root issue but it definitely seems (as you found) it's somewhere in Python checks executions. Due to the fact that Python is loaded through C APIs by Go, the error messages can get lost between the two runtimes without the right flags and tooling.

If your strace output is accurate, you have already found the problem with your build and you can find the relevant code here. If you're interested in deeper C-level debugging, I'd recommend using the gdb troubleshooting steps. It is a bit unexpected that you're not getting the Python error output from the try/catch though but there's not enough info in the bug report to indicate why and/or if this is connected somehow to how you're building the Agent.

xnox added the team/triage label Oct 15, 2024

sgnn7 added team/platform-integrations and removed team/triage labels Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] selfbuilt datadog agent 7.57.2 segfaults on disk check #30149

[BUG] selfbuilt datadog agent 7.57.2 segfaults on disk check #30149

xnox commented Oct 15, 2024

xnox commented Oct 15, 2024

sgnn7 commented Oct 16, 2024

[BUG] selfbuilt datadog agent 7.57.2 segfaults on disk check #30149

[BUG] selfbuilt datadog agent 7.57.2 segfaults on disk check #30149

Comments

xnox commented Oct 15, 2024

xnox commented Oct 15, 2024

sgnn7 commented Oct 16, 2024