Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] IVF-PQ index creation crashes on aarch64 for wiki_all_1M benchmark #2324

Open
mfoerste4 opened this issue May 17, 2024 · 1 comment
Open
Labels
bug Something isn't working

Comments

@mfoerste4
Copy link
Collaborator

mfoerste4 commented May 17, 2024

Describe the bug
IVF-PQ build of wiki_all_1M fails on Grace/H200 with

`CUDA Exception: Warp Illegal Address

Thread 1 "RAFT_IVF_PQ_ANN" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 0, grid 39271, block (821,0,0), thread (160,0,0), device 0, sm 0, warp 15, lane 0]
0x00004002dba2b8b0 in raft::neighbors::ivf_pq::detail::process_and_fill_codes_kernel<256u, 8u, long><<<(8192,1,1),(256,1,1)>>> ()
at /home/scratch.mfoerster_gpu/raft_ws/raft/cpp/include/raft/neighbors/detail/ivf_pq_build.cuh:1164 in _ZN4raft9neighbors6ivf_pq6detail14encode_vectorsILj32ElEclElj inlined from ivf_pq_codepacking.cuh:166
1164 auto t = in_vectors(i, j, k) - pq_centers(partition_ix, k, l);
`

Steps/Code to reproduce bug
The IVF-PQ index build fails both standalone and within cagra.

RAFT_IVF_PQ_ANN_BENCH --build --force --data_prefix=<datasets> --benchmark_filter=raft_ivf_pq.d64-nlist16K wiki_all_1M.json

Expected behavior
Finish benchmark without crash

Environment details:

  • Environment location: multiple starship/lego instances in computelab
  • Method of RAFT install: conda environment of 24.06 tip recipe with manual build
@mfoerste4 mfoerste4 added the bug Something isn't working label May 17, 2024
@mfoerste4
Copy link
Collaborator Author

I just checked that this does not repro on other ARM CPUs (altra system).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant