Replace conda with pyenv to fix incorrect libstdc++ use in jammy CI. #6966

ssheorey · 2024-09-13T17:38:14Z

Type

Bug fix (non-breaking change which fixes an issue): Fixes #
New feature (non-breaking change which adds functionality). Resolves #
Breaking change (fix or feature that would cause existing functionality to not work as expected) Resolves #

Motivation and Context

CUDA jammy CI (python tests) fail since pytest causes the conda libstdc++ library to be loaded instead of the system libstdc++ library. This is older and prevents Open3D cuda pybind lib from loading.

Checklist:

I have run python util/check_style.py --apply to apply Open3D code style
to my code.
This PR changes Open3D behavior or adds new functionality.
- Both C++ (Doxygen) and Python (Sphinx / Google style) documentation is
  updated accordingly.
- I have added or updated C++ and / or Python unit tests OR included test
  results (e.g. screenshots or numbers) here.
I will follow up and update the code if CI fails.
For fork PRs, I have selected Allow edits from maintainers.

Description

Replace python install from conda with pyenv python install. This uses the system libraries.
Use RPATH instead of RUNPATH to load libc++abi.so directly in Python. No need to find and load explicitly for filament.
Use libc++ v11 to build in Ubuntu 22.04. Warn if newer version is used. libc++ v12 and later use LLVM libunwind, which is incompatible with system libunwind and causes the Python crash.
TODO: Solution for Ubuntu 24.04

TODO: Solution for Ubuntu 24.04 (newer libc++ in general).

update-docs · 2024-09-13T17:38:19Z

Thanks for submitting this pull request! The maintainers of this repository would appreciate if you could update the CHANGELOG.md based on your changes.

ssheorey · 2024-09-17T05:34:08Z

Seg fault in python test_tensormap.py here when built with gcc 11 (Ubuntu 20.04 jammy) in release config. No seg fault in debug or relwithdebinfo configs.

...
def test_tensormap(device):
...
    # __delitem__ operator. SEGFAULT
    with pytest.raises(RuntimeError) as excinfo:
        del tm.positions
        assert 'cannot be deleted' in str(excinfo.value)
...
   # Set primary key. SEGFAULT
    with pytest.raises(KeyError) as e:
        tm.primary_key = o3c.Tensor.ones((2, 3), dtype, device)
...
    # Get unknown attributes. SEGFAULT
    with pytest.raises(KeyError) as e:
        normals = tm.normals

Fix: switch to gcc 13.

Update nanoflannimpl to fix this warning as error:

    inlined from ‘open3d::core::nns::impl::{anonymous}::_KnnSearchCPU<float, int, open3d::core::nns::NeighborSearchAllocator<float, int>, 1>(open3d::core::nns::NanoFlannIndexHolderBase*, int64_t*, size_t, const float*, size_t, const float*, size_t, int, bool, bool, open3d::core::nns::NeighborSearchAllocator<float, int>&)::<lambda(const float*, const float*, size_t)>’ at /home/ssheorey/Documents/Open3D/Code/Open3D/cpp/open3d/core/nns/NanoFlannImpl.h:126:5,
    inlined from ‘open3d::core::nns::impl::{anonymous}::_RadiusSearchCPU<float, int, open3d::core::nns::NeighborSearchAllocator<float, int>, 1>(open3d::core::nns::NanoFlannIndexHolderBase*, int64_t*, size_t, const float*, size_t, const float*, size_t, const float*, bool, bool, bool, bool, open3d::core::nns::NeighborSearchAllocator<float, int>&)::<lambda(const tbb::detail::d1::blocked_range<long unsigned int>&)>’ at /home/ssheorey/Documents/Open3D/Code/Open3D/cpp/open3d/core/nns/NanoFlannImpl.h:258:41:
/usr/include/c++/13/bits/new_allocator.h:168:33: error: ‘void operator delete(void*, std::size_t)’ called on pointer ‘<unknown>’ with nonzero offset [4, 9223372036854775804] [-Werror=free-nonheap-object]

gcc-11 leads to a seg fault in tensormap in Release mode.

benjaminum

lgtm

benjaminum · 2024-09-17T08:56:47Z

cpp/open3d/core/nns/NanoFlannImpl.h

@@ -147,8 +140,9 @@ void _KnnSearchCPU(NanoFlannIndexHolderBase *holder,
                    for (size_t valid_i = 0; valid_i < num_valid; ++valid_i) {
                        TIndex idx = result_indices[valid_i];
                        if (ignore_query_point &&
-                            points_equal(&queries[i * dimension],
-                                         &points[idx * dimension], dimension)) {
+                            std::equal(&queries[i * dimension],


Nice catch! This should significantly reduce the number of heap allocations

benjaminum · 2024-09-17T08:59:06Z

docker/Dockerfile.ci

-        bash Miniconda3-latest-Linux-x86_64.sh -b; \
-        rm Miniconda3-latest-Linux-x86_64.sh; \
+        curl https://pyenv.run | bash \
+        && pyenv update \


I was working on switching to Miniforge in #6717 😅

miniforge likely won't fix this issue, since it's a copy of conda and likely uses it's on libstdc++ as well.

ssheorey · 2024-09-20T15:50:27Z

Seg fault with Python, CUDA 11.8, 12.1 and Ubuntu 22.04. No issue with C++, the open3d-cpu package, or if CUDA_VISIBLE_DEVICES is empty. No issue on Ubuntu 20.04 or Windows.

… No need to find and load explicitly. Use libc++11 to build in Ubuntu 22.04. Warn if newer version is used. TODO: Solution for Ubuntu 24.04

benjaminum

approved. See question comment

benjaminum · 2024-09-29T09:30:07Z

docker/Dockerfile.ci

@@ -67,34 +67,55 @@ RUN if [ "${BUILD_SYCL_MODULE}" = "ON" ]; then \
        rm -rf /etc/apt/sources.list.d/oneAPI.list; \
    fi

-# Dependencies: basic
+# Dependencies: basic and python-build
+# gcc-11 causes a seg fault in tensormap when built in Release mode. Upgrade to


Have we found the root cause for that or is it really a bug only due to gcc11?

Thanks for catching that. I think it was actually the libunwind issue. I've been working with gcc-11 on jammy (22.04) without issues. Reverted.

Replace conda with pyenv to fix incorrect libstdc++ use in jammy CI.

3e964a5

ssheorey force-pushed the ss/cuda-ci-jammy-fix branch from a95f3be to 3e964a5 Compare September 13, 2024 18:01

add path to python open3d cli

b774e5f

Use gcc-13 instead of gcc-11 when building in Ubuntu 20.04.

c7c2c03

gcc-11 leads to a seg fault in tensormap in Release mode.

ssheorey requested review from benjaminum September 17, 2024 06:35

benjaminum approved these changes Sep 17, 2024

View reviewed changes

Switch to g++-12, g++-13 not available in jammy repos.

05cf179

Use RPATH instead of RUNPATH to load libc++abi.so directly in Python.…

1bb4602

… No need to find and load explicitly. Use libc++11 to build in Ubuntu 22.04. Warn if newer version is used. TODO: Solution for Ubuntu 24.04

benjaminum approved these changes Sep 29, 2024

View reviewed changes

Undo gcc-12 instead of gcc-11.

1d1aeb1

ssheorey merged commit e88c7b1 into main Sep 30, 2024
41 of 45 checks passed

ssheorey deleted the ss/cuda-ci-jammy-fix branch September 30, 2024 06:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace conda with pyenv to fix incorrect libstdc++ use in jammy CI. #6966

Replace conda with pyenv to fix incorrect libstdc++ use in jammy CI. #6966

ssheorey commented Sep 13, 2024 •

edited

Loading

update-docs bot commented Sep 13, 2024

ssheorey commented Sep 17, 2024 •

edited

Loading

benjaminum left a comment

benjaminum Sep 17, 2024

benjaminum Sep 17, 2024

ssheorey Sep 17, 2024

ssheorey commented Sep 20, 2024

benjaminum left a comment

benjaminum Sep 29, 2024

ssheorey Sep 29, 2024

Replace conda with pyenv to fix incorrect libstdc++ use in jammy CI. #6966

Replace conda with pyenv to fix incorrect libstdc++ use in jammy CI. #6966

Conversation

ssheorey commented Sep 13, 2024 • edited Loading

Type

Motivation and Context

Checklist:

Description

update-docs bot commented Sep 13, 2024

ssheorey commented Sep 17, 2024 • edited Loading

benjaminum left a comment

Choose a reason for hiding this comment

benjaminum Sep 17, 2024

Choose a reason for hiding this comment

benjaminum Sep 17, 2024

Choose a reason for hiding this comment

ssheorey Sep 17, 2024

Choose a reason for hiding this comment

ssheorey commented Sep 20, 2024

benjaminum left a comment

Choose a reason for hiding this comment

benjaminum Sep 29, 2024

Choose a reason for hiding this comment

ssheorey Sep 29, 2024

Choose a reason for hiding this comment

ssheorey commented Sep 13, 2024 •

edited

Loading

ssheorey commented Sep 17, 2024 •

edited

Loading