Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky Test from NativeMemoryAllocationTests #2169

Open
VijayanB opened this issue Sep 30, 2024 · 2 comments
Open

Flaky Test from NativeMemoryAllocationTests #2169

VijayanB opened this issue Sep 30, 2024 · 2 comments
Assignees

Comments

@VijayanB
Copy link
Member

Flaky test is coming from https://github.com/opensearch-project/k-NN/actions/runs/11113230012/job/30877097217?pr=2167

Tests with failures:
888 tests completed, 3 failed, 223 skipped
 - org.opensearch.knn.index.memory.NativeMemoryAllocationTests.testIndexAllocation_closeBlocking
 - org.opensearch.knn.index.memory.NativeMemoryAllocationTests.classMethod
 REPRODUCE WITH: gradlew ':test' --tests "org.opensearch.knn.index.memory.NativeMemoryAllocationTests" -Dtests.seed=9A46305B6CCE0712 -Dtests.security.manager=false -Dtests.locale=en-US -Dtests.timezone=Etc/UTC -Druntime.java=21
 gradlew ':test' --tests "org.opensearch.knn.index.memory.NativeMemoryAllocationTests" -Dtests.seed=9A46305B6CCE0712 -Dtests.security.manager=false -Dtests.locale=en-US -Dtests.timezone=Etc/UTC -Druntime.java=21
 gradlew ':test' --tests "org.opensearch.knn.index.memory.NativeMemoryAllocationTests.testIndexAllocation_closeBlocking" -Dtests.seed=9A46305B6CCE0712 -Dtests.security.manager=false -Dtests.locale=tt -Dtests.timezone=Europe/Ljubljana -Druntime.java=21
@0ctopus13prime
Copy link
Contributor

This PR(#2139) is trying to fix this flaky testing.
In which, it changed its way to read a lock first then followed by calling close().

    public void testIndexAllocation_closeBlocking() throws InterruptedException, ExecutionException {
        // Prepare mocking and a thread pool.
        WatcherHandle<FileWatcher> watcherHandle = (WatcherHandle<FileWatcher>) mock(WatcherHandle.class);
        ExecutorService executorService = Executors.newSingleThreadExecutor();

        // Enable `KNN_FORCE_EVICT_CACHE_ENABLED_SETTING` to force it to block other threads.
        // Having it false will make `IndexAllocation` to run close logic in a different thread.
        when(clusterSettings.get(KNN_FORCE_EVICT_CACHE_ENABLED_SETTING)).thenReturn(true);
        NativeMemoryAllocation.IndexAllocation blockingIndexAllocation = new NativeMemoryAllocation.IndexAllocation(
            mock(ExecutorService.class),
            0,
            0,
            null,
            "test",
            "test",
            watcherHandle
        );

        // Acquire a read lock
        blockingIndexAllocation.readLock();

        // This should be blocked as a read lock is still being held.
        Future<?> closingThread = executorService.submit(blockingIndexAllocation::close);

        // Check if thread is currently blocked
        try {
            closingThread.get(5, TimeUnit.SECONDS);
            fail("Closing should be blocked. We are still holding a read lock.");
        } catch (TimeoutException ignored) {}

        // Now, we unlock a read lock.
        blockingIndexAllocation.readUnlock();
        // As we don't hold any locking, the closing thread can now good to acquire a write lock.
        closingThread.get();

        // Waits until close
        assertTrue(blockingIndexAllocation.isClosed());
        executorService.shutdown();
    }

@0ctopus13prime
Copy link
Contributor

PR(#2139) fixed the timing issue.
We can close this and reopen issue if it is still reoccuring.
But at least, in the PR, the failed testing now has passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog (Hot)
Development

No branches or pull requests

4 participants