Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Deletion of Descriptors isn't fully supported #201

Open
ifadams opened this issue Jul 25, 2024 · 2 comments
Open

Bug: Deletion of Descriptors isn't fully supported #201

ifadams opened this issue Jul 25, 2024 · 2 comments
Assignees
Labels
Bug Indicates unexpected or undesired behaviors

Comments

@ifadams
Copy link
Contributor

ifadams commented Jul 25, 2024

Describe the bug

As stated in Wiki: Deletion Capabilities, the _deletion query allows a user to delete the content within VDMS that is associated with a find query (FindImage, FindEntity, FindDescriptor). Currently, descriptor deletion is NOT fully supported.

  1. Metadata is deleted, descriptor is no longer returned in similarity search, but index is still present because the number of returned results are effected after deletion
  2. This is visible in the Filtering on metadata section which starts on botton of page 26 of vdms_latest.pdf. In the section, ID=2 is extracted and displayed. A search for K=3 NN using ID=2 vector as the query is performed and distances displayed. ID is them deleted and then we search again with K=5. You’ll see that only 4 results are returned but ID=2 is excluded.

To Reproduce
Steps to reproduce the behavior (as shown in attached document):

  1. Add descriptors with unique property like ID
  2. Complete similarity search using K
  3. Find a descriptor within K using unique property and delete it
  4. Re-run similarity search using K (Notice K-1 results are returned and deleted descriptor isn't present)
@ifadams ifadams added the Bug Indicates unexpected or undesired behaviors label Jul 25, 2024
@ifadams
Copy link
Contributor Author

ifadams commented Jul 25, 2024

Migrated from internal from @s-gobriel

I think it is important to explain the delete functionality from the VCL side.

The basic functionality of delete IDs works with the following in mind.

The different index engines handle the delete functionality differently, as follows:

• IndexIVF; store the descriptor ids explicitly with the index. As a result, the ids of the other descriptors will not change after a delete operation.

• IndexFlat (other indices in FAISS that we are not supporting in VDMS has the same behavior like IndexPQ, ..etc.). Supports remove_id function which will delete the descriptor in question. However, it is important to understand that this index does not store the IDs explicitly, hence, the delete operation will shift the ids of vectors bigger than the current id by 1.

• IndexFLINNG (no delete operation is supported because for hash_tables delete is not supported)

The logic for VDMS client or the user application need to be modified to map the logic explained above to present the correct vectors to the application after a deletion operation.

Hope this is clear.

BTW, related to the delete functionality, duplicate detection is a trickier issue that can only be handled by the application.

@ifadams
Copy link
Contributor Author

ifadams commented Sep 30, 2024

Active discussions underway, updates on diagnosis here:

What's going on is a mismatch between the behavior of the KNN, PMGD, and client expectations.

Currently, we allow an "_expiration" field to be included as part of a descriptor. This field sets a timer for automatic delete (if turned on) which in will automatically delete PMGD graph nodes affiliated with a particular descriptor.

A KNN search returns the nearest neighbors, and the IDs are used internally to increase the specificity of the query.

However, the index the KNN is running over does not always support deletion, and currently internally deletion is not deleted. So its possible that a KNN search returns a "deleted" ID, and since it does not match an existing ID in the graph database, we return nothing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indicates unexpected or undesired behaviors
Projects
None yet
Development

No branches or pull requests

3 participants