Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align driver symlinks with libnvidia-container #715

Merged
merged 1 commit into from
Oct 1, 2024

Conversation

elezar
Copy link
Member

@elezar elezar commented Sep 26, 2024

This change aligns the creation of symlinks under CDI with
the implementation in libnvidia-container. If the driver libraries
are present, the following symlinks are created:

  • {{ .LibRoot }}/libcuda.so -> libcuda.so.1
  • {{ .LibRoot }}/libnvidia-opticalflow.so -> libnvidia-opticalflow.so.1
  • {{ .LibRoot }}/libGLX_indirect.so.0 -> libGLX_nvidia.so.{{ .Version }}

@elezar elezar requested a review from jojimt September 26, 2024 12:38
@elezar elezar self-assigned this Sep 26, 2024
@elezar elezar force-pushed the add-libcuda-so-symlink branch 2 times, most recently from e83c3cf to 3ef8b44 Compare September 26, 2024 13:00
for _, mount := range mounts {
dir, filename := filepath.Split(mount.Path)
// TODO: We should include the other libraries as is done here:
// https://github.com/NVIDIA/nvidia-container-toolkit/blob/79c59aeb7f59dd612793ac80a8d7022c554634bb/internal/platform-support/tegra/symlinks.go#L84-L97
Copy link
Contributor

@cdesiniotis cdesiniotis Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code you linked for tegra creates libcuda.so -> libcuda.so.1 while the code below seems to be creating libcuda.so -> libcuda.so.RM_VERSION. Why the difference?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I was trying to simplify things by linking to the .RM_VERSION, but since we have the following the driver manifest:

libcuda.so.550.120 0755 CUDA_LIB NATIVE / MODULE:gpgpu
libcuda.so.1 0000 CUDA_SYMLINK NATIVE / libcuda.so.550.120 MODULE:gpgpu
libcuda.so 0000 CUDA_SYMLINK NATIVE / libcuda.so.1 MODULE:gpgpu

we should link to libcuda.so.1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the smylink should be to libcuda.so.1 not the actual .RM_VERSION file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in latest.

@klueska
Copy link
Contributor

klueska commented Oct 1, 2024

There are other libraries that we create these one-off symlinks for in libnvidia-container. Should we also add these as part of this PR?
https://github.com/NVIDIA/libnvidia-container/blob/921e2f3197385173cf8670342e96e98afe9b3dd3/src/nvc_mount.c#L548

@elezar
Copy link
Member Author

elezar commented Oct 1, 2024

There are other libraries that we create these one-off symlinks for in libnvidia-container. Should we also add these as part of this PR? https://github.com/NVIDIA/libnvidia-container/blob/921e2f3197385173cf8670342e96e98afe9b3dd3/src/nvc_mount.c#L548

Yes. This was the intent of the comment above the link. I think doing it now makes sense and will update.

This change aligns the creation of symlinks under CDI with
the implementation in libnvidia-container. If the driver libraries
are present, the following symlinks are created:

* {{ .LibRoot }}/libcuda.so -> libcuda.so.1
* {{ .LibRoot }}/libnvidia-opticalflow.so -> libnvidia-opticalflow.so.1
* {{ .LibRoot }}/libGLX_indirect.so.0 -> libGLX_nvidia.so.{{ .Version }}

Signed-off-by: Evan Lezar <[email protected]>
@elezar elezar changed the title Add libcuda.so symlink to CDI spec Align driver symlinks with libnvidia-container Oct 1, 2024
@elezar elezar added the must-backport The changes in PR need to be backported to at least one stable release branch. label Oct 1, 2024
@elezar elezar merged commit 19482da into NVIDIA:main Oct 1, 2024
8 checks passed
@elezar elezar deleted the add-libcuda-so-symlink branch October 1, 2024 16:16
@elezar
Copy link
Member Author

elezar commented Oct 1, 2024

Backport PR created as #724

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
must-backport The changes in PR need to be backported to at least one stable release branch.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants