Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update default engine to FAISS #2221

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

VijayanB
Copy link
Member

@VijayanB VijayanB commented Oct 20, 2024

Description

  • Update default engine to FAISS.
  • Fixed tests that assumed default engine to be nmslib to either use nmslib explicitly or update test assertion to look for faiss specific artifacts like file extension instead of nmslib artifacts
  • removed tests that are just validating default (provided it is nmslib) doesn't support certain features

Related Issues

#2163

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Since faiss supports more features than nmslib, and, we had seen
data points that there are more number of vector search
users are interesed in faiss, we will be updating default
engine to be faiss. This will benefit users who preffered
to use defaults while working with vector search.

Signed-off-by: Vijayan Balasubramanian <[email protected]>
@VijayanB VijayanB force-pushed the update-default-engine branch 3 times, most recently from 96aca39 to 97240f3 Compare October 20, 2024 21:54
Signed-off-by: Vijayan Balasubramanian <[email protected]>
@VijayanB
Copy link
Member Author

Copy link
Member

@vibrantvarun vibrantvarun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Will wait for other engineers review

@@ -203,7 +203,7 @@ static KNNMethodContext createKNNMethodContextFromLegacy(
? topLevelSpaceType
: KNNVectorFieldMapperUtil.getSpaceType(indexSettings);
return new KNNMethodContext(
KNNEngine.NMSLIB,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep as NMSLIB - nmslib is only legacy support

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add comment to like I should have done haha

@jmazanec15
Copy link
Member

@@ -29,7 +29,7 @@ public enum KNNEngine implements KNNLibrary {
FAISS(FAISS_NAME, Faiss.INSTANCE),
LUCENE(LUCENE_NAME, Lucene.INSTANCE);

public static final KNNEngine DEFAULT = NMSLIB;
public static final KNNEngine DEFAULT = FAISS;
Copy link
Collaborator

@navneet1v navneet1v Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How we are handling the backward compatibility? Also can we validate if we have proper BWC tests to test this default engine change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just curious to know , why do we need to handle BWC in this case , So i created a Index , without specifying any engine so it picked NMSLIB , so now this information is already there in Cluster State and Serialised and passed to all Nodes, Now During query time , we do get it information from there and it does not depend on the new Default.

@@ -114,7 +115,7 @@ public void testGetAllEngineFileContexts() throws IOException, ExecutionExceptio
engineFileContexts = knnIndexShard.getAllEngineFileContexts(searcher.getIndexReader());
assertEquals(1, engineFileContexts.size());
List<String> paths = engineFileContexts.stream().map(KNNIndexShard.EngineFileContext::getIndexPath).collect(Collectors.toList());
assertTrue(paths.get(0).contains("hnsw") || paths.get(0).contains("hnswc"));
assertTrue(paths.get(0).contains(FAISS_HNSW_EXTENSION));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the compound extension covered?

@@ -263,29 +263,6 @@ public void testByteVectorDataTypeWithNmslibEngine() {
assertTrue(ex.getMessage().contains("is not supported for vector data type"));
}

@SneakyThrows
public void testByteVectorDataTypeWithLegacyFieldMapperKnnIndexSetting() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking if this case is covered when engine is explicitly NMSLIB

@@ -146,7 +146,7 @@ public void testTypeParser_build_fromKnnMethodContext() throws IOException {
// Check that knnMethodContext takes precedent over both model and legacy
ModelDao modelDao = mock(ModelDao.class);

SpaceType spaceType = SpaceType.COSINESIMIL;
SpaceType spaceType = SpaceType.DEFAULT;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why this change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants