Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix recall for Hamming distance #508

Merged
merged 1 commit into from
Apr 8, 2024
Merged

Conversation

ankane
Copy link
Contributor

@ankane ankane commented Apr 6, 2024

Both sift-256-hamming and word2bits-800-hamming always report 0 recall, as the distances in the HDF5 files are floats between 0 and 1 rather than the Hamming distance. Multiplying by the dataset dimensions fixes it.

This issue is likely the cause of #420.

@ankane
Copy link
Contributor Author

ankane commented Apr 6, 2024

Another approach would be to change Hamming distance to use mean.

 metrics = {
     "hamming": Metric(
-        distance=lambda a, b: np.sum(a.astype(np.bool_) ^ b.astype(np.bool_)),
+        distance=lambda a, b: np.mean(a.astype(np.bool_) ^ b.astype(np.bool_)),
         distance_valid=lambda a: True
     ),

@ankane ankane changed the title Fix metrics for sift-256-hamming and word2bits-800-hamming Fix recall for Hamming distance Apr 6, 2024
@maumueller maumueller merged commit 13e3629 into erikbern:main Apr 8, 2024
@maumueller
Copy link
Collaborator

Thanks!

@ankane
Copy link
Contributor Author

ankane commented Apr 8, 2024

Thanks @maumueller!

@ankane ankane mentioned this pull request Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants