Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory #1882

Open
moophlo opened this issue Oct 15, 2024 · 1 comment

Comments

@moophlo
Copy link

moophlo commented Oct 15, 2024

Describe the bug
Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory

To Reproduce
Steps to reproduce the behavior:
Just start the pod

Expected behavior
Expect the file to be there

Screenshots
If applicable, add screenshots to help explain your problem.

System (please complete the following information):

  • OS version: Mint 22
  • Kernel version:6.8.0-47-generic
  • Device plugins version: intel/intel-gpu-plugin:0.31.0
  • Hardware info: [e.g. SPR with QAT]

Additional context

I1015 17:50:35.074780       1 gpu_plugin_resource_manager.go:174] GPU device plugin resource manager enabled
W1015 17:50:40.075999       1 gpu_plugin_resource_manager.go:315] Failed to read pods from kubelet API: Get "https://192.168.10.15:10250/pods": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
W1015 17:50:40.082039       1 labeler.go:176] Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory
W1015 17:55:40.327845       1 labeler.go:176] Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory
W1015 17:56:17.135634       1 gpu_plugin_resource_manager.go:645] Pending POD annotations from scheduler not yet visible for pod "intel-gpu"
W1015 17:56:17.135662       1 gpu_plugin_resource_manager.go:461] retrying POD resolving after sleeping
W1015 17:56:19.431164       1 gpu_plugin_resource_manager.go:645] Pending POD annotations from scheduler not yet visible for pod "intel-gpu"
E1015 17:56:19.431252       1 gpu_plugin_resource_manager.go:469] allocation candidate not found, perhaps the GPU scheduler extender is not called, err:things didn't work out, but perhaps a retry will help
W1015 17:56:20.529522       1 gpu_plugin_resource_manager.go:645] Pending POD annotations from scheduler not yet visible for pod "intel-gpu"
W1015 17:56:20.529585       1 gpu_plugin_resource_manager.go:398] retrying POD resolving after sleeping
W1015 17:56:22.831799       1 gpu_plugin_resource_manager.go:645] Pending POD annotations from scheduler not yet visible for pod "intel-gpu"
E1015 17:56:22.831862       1 gpu_plugin_resource_manager.go:406] allocation candidate not found, perhaps the GPU scheduler extender is not called, err:things didn't work out, but perhaps a retry will help
W1015 18:00:40.329390       1 labeler.go:176] Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory
W1015 18:05:26.528397       1 gpu_plugin_resource_manager.go:645] Pending POD annotations from scheduler not yet visible for pod "intel-gpu"
W1015 18:05:26.528495       1 gpu_plugin_resource_manager.go:461] retrying POD resolving after sleeping
W1015 18:05:28.634033       1 gpu_plugin_resource_manager.go:645] Pending POD annotations from scheduler not yet visible for pod "intel-gpu"
E1015 18:05:28.724962       1 gpu_plugin_resource_manager.go:469] allocation candidate not found, perhaps the GPU scheduler extender is not called, err:things didn't work out, but perhaps a retry will help
W1015 18:05:29.229192       1 gpu_plugin_resource_manager.go:645] Pending POD annotations from scheduler not yet visible for pod "intel-gpu"
W1015 18:05:29.229208       1 gpu_plugin_resource_manager.go:398] retrying POD resolving after sleeping
W1015 18:05:30.901631       1 gpu_plugin_resource_manager.go:645] Pending POD annotations from scheduler not yet visible for pod "intel-gpu"
E1015 18:05:30.901664       1 gpu_plugin_resource_manager.go:406] allocation candidate not found, perhaps the GPU scheduler extender is not called, err:things didn't work out, but perhaps a retry will help
W1015 18:05:40.331411       1 labeler.go:176] Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory
W1015 18:10:40.332615       1 labeler.go:176] Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory
W1015 18:15:40.335025       1 labeler.go:176] Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory
W1015 18:20:40.337430       1 labeler.go:176] Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory
W1015 18:25:40.338627       1 labeler.go:176] Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory
@eero-t
Copy link
Contributor

eero-t commented Oct 16, 2024

Can't read file: open /sys/class/drm/card3/lmem_total_bytes: no such file or directory

Depending on which GPU HW you have, and which kernel driver you use for it, this message is expected:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants