Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IVF-Flat reconstruction #1270

Open
wants to merge 48 commits into
base: branch-23.08
Choose a base branch
from

Conversation

viclafargue
Copy link
Contributor

Answers #1205

@viclafargue viclafargue requested a review from a team as a code owner February 10, 2023 15:44
@github-actions github-actions bot added the cpp label Feb 10, 2023
@cjnolet cjnolet added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 10, 2023
@cjnolet cjnolet added the FAISS label Feb 13, 2023
@achirkin
Copy link
Contributor

NB: ivf-pq reconstruction: #1298

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @viclafargue for this PR! The details of the reconstruction function needs to updated after #1271 is merged, therefore I did not review that part yet. Below please find a few comments for the rest.

Could you update the PR destription to mention that it also adds the mdspan interface for ivf-pq?

cpp/include/raft/neighbors/ivf_pq.cuh Outdated Show resolved Hide resolved
cpp/include/raft/spatial/knn/detail/ann_quantized.cuh Outdated Show resolved Hide resolved
cpp/include/raft/neighbors/ivf_pq.cuh Outdated Show resolved Hide resolved
cpp/src/distance/neighbors/ivfpq_build.cu Outdated Show resolved Hide resolved
@cjnolet
Copy link
Member

cjnolet commented May 15, 2023

@viclafargue is this PR ready for another review yet?

@cjnolet cjnolet added 4 - Waiting on Author Waiting for author to respond to review and removed 4 - Waiting on Reviewer Waiting for reviewer to review or respond labels May 16, 2023
@viclafargue viclafargue requested a review from a team as a code owner May 23, 2023 13:18
@github-actions github-actions bot added the CMake label May 23, 2023
@viclafargue
Copy link
Contributor Author

viclafargue commented May 23, 2023

@viclafargue is this PR ready for another review yet?

The PR is ready for review.
cc @achirkin @tfeher @cjnolet

@viclafargue viclafargue added 3 - Ready for Review and removed 4 - Waiting on Author Waiting for author to respond to review labels May 29, 2023
Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, here's couple comments on the api.

cpp/include/raft/neighbors/detail/ivf_flat_build.cuh Outdated Show resolved Hide resolved
template <typename T, typename IdxT>
void reconstruct_list_data(raft::resources const& handle,
const index<T, IdxT>& index,
device_matrix_view<T, IdxT, row_major> out_vectors,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ivf_pq, list and query dimensions are limited to uint32_t rather than IdxT (type of DB indices).

Here in the ivf_flat API you use device_matrix_view<T, IdxT, row_major> rather than device_matrix_view<T, uint32_t, row_major> , but you also set both label and offset arguments to uint32_t. Could you please check here, in the overload above, and down the detail namespace what are the appropriate indexing types and make it consistent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using IdxT for the IVF-Flat API. I corrected the code so that it uses the template type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, interesting! Out of curiosity, could you add a test case for ivf-flat with 64-bit indexing type and try to call something with an offset slightly outside the 32-bit range? :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick browsing confirms that 64-bit indexing/extents are actually not supported

return search(handle,
params,
index,
queries.data_handle(),
static_cast<std::uint32_t>(queries.extent(0)),
static_cast<std::uint32_t>(neighbors.extent(1)),
neighbors.data_handle(),
distances.data_handle(),
nullptr);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The portion of code you are mentioning is indeed probably a mistake. At the moment the IVF-Flat API functions are templated with IdxT everywhere, and it is set to use uint64_t during compilation. Do you recommend that we update the IVF-Flat API to systematically use uint32_t instead? If so, this is an extensive change, and that would maybe require a separate PR, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ivf-flat can actually work with extents and input sizes IdxT larger than 32-bit, I don't see any problem using IdxT in the api. But I suspect this is not the case. Hence we should fix it either by limiting extent types to 32-bit or making sure it works with all possible IdxT (not only those we instantiate explicitly btw). I agree this work is worth a separate PR.
However, for the new public functions added in this PR, I think it makes sense to fix the api from the beginning, keeping in mind the future changes in the other parts of the code.


thrust::device_ptr<const IdxT> vector_ids_ptr =
thrust::device_pointer_cast(vector_ids.data_handle());
IdxT max_indice =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the consensus was to opt for a "inverted-index" module (hashmap/array/whatever) shared among ivf methods to convert user indices to (label, in-cluster-offset) pairs?
This code could break if the user adds non-contiguous range of large indices to the DB.

Copy link
Contributor Author

@viclafargue viclafargue Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! Yes absolutely, this is a temporary solution that would be largely improved by the use of a hashmap in a follow-up PR. But, I thought that letting this version as it is would set the API and allow people to run a reconstruction if their use case allows it (smaller index). But, can still remove it if that's the better path forward. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually thinking of decoupling the ivf-list hashmap struct api from the ivf-pq and ivf-flat methods. Hence a user would need to construct the hashmap explicitly once (costly operation) and then either:
a) search-by-user-ids in two calls (e.g. hashmap::get_lists_offsets + index::get_vectors)
b) pass the hashmap as an argument to the ivf index methods (index::get_vectors)

Not sure what we will decide in the end, but point is the api may change as we progress with the hashmap. Then, maybe we just can keep this function in the detail namespace for now? Hence the work won't be lost and if someone needs the functionality urgently, they can use the detail function, and we don't need to break the public api in the follow-up PR.

@viclafargue viclafargue changed the base branch from branch-23.06 to branch-23.08 July 27, 2023 12:03
@ajschmidt8 ajschmidt8 removed the request for review from a team August 3, 2023 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review ci CMake cpp FAISS improvement Improvement / enhancement to an existing function non-breaking Non-breaking change python
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

5 participants