Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle zero chunk size #263

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Handle zero chunk size #263

wants to merge 1 commit into from

Conversation

moshaad7
Copy link
Contributor

The existing implementation of the method
getChunkSize(...) (uint64, error), in some cases,
can return chunkSize==0 and err==nil.

The onus is on the caller to check for such possibility and handle it properly.
Callers often use the returned chunkSize as a divisor and a zero chunkSize lead to panic.
see #209

This PR intends to update the method implementation to always return an error in case the returned chunkSize value is 0.
That way caller need to only worry about error being non-nil.

Callers which are ok with 0 chunkSize can check the returned error against ErrChunkSizeZero

@moshaad7 moshaad7 mentioned this pull request Sep 12, 2024
@moshaad7 moshaad7 self-assigned this Sep 12, 2024
@moshaad7 moshaad7 linked an issue Sep 12, 2024 that may be closed by this pull request
Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moshaad7 per our conversation, let's get rid of the redundant check in posting.go now that you've handled it within this method.

Let's also get some clarity on what happens when we return an error when we stumble into this situation.

Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per previous comment.

@abhinavdangeti
Copy link
Member

@moshaad7 check out the latest 2 stack traces shared here - blevesearch/bleve#1662 (comment) .

It seems even if we handle the error case of chunkSize being zero prior to setting posting's attribute, you could run into divide by zero panics downstream on account of not updating the attribute in line 644 and 747 of posting.go which we'll need to handle as well.

@moshaad7
Copy link
Contributor Author

moshaad7 commented Sep 17, 2024

@moshaad7 check out the latest 2 stack traces shared here - blevesearch/bleve#1662 (comment) .

It seems even if we handle the error case of chunkSize being zero prior to setting posting's attribute, you could run into divide by zero panics downstream on account of not updating the attribute in line 644 and 747 of posting.go which we'll need to handle as well.

@abhinavdangeti I agree.
The challenging part about this is, if we look at the implementation of nextDocNumNextOrAfter() method,

func (i *PostingsIterator) nextDocNumAtOrAfter(...) {
       ...
       
        if i.Actual == nil || !i.Actual.HasNext() {
		return 0, false, nil
	}

	if i.postings == nil || i.postings == emptyPostingsList {
		// couldn't find anything
		return 0, false, nil
	}
	
	...
	
}

A PostingsList can only pass these checks and cause panic if

  • PostingsList.postings (The roaringBitmap) have more than 1 items in it, And
  • PistingsList.chunkSize==0

I tried several flows, by writing test cases, to try to get into this state of PostingsList,
and also tried to eyeball the code to reproduce this, but couldn't.

I then wrote the blow test case to forcefully endup in the panic situation, and can see divide by zero, when the above two conditions met.

func TestDictWithNilFstReader(t *testing.T) {
	preAllocPl := &PostingsList{} // postings list with nil postings and zero chunkSize

	// Add an item just to simulate a populated postings list
	preAllocPl.postings = roaring.New()
	preAllocPl.postings.Add(1000) // this will be cleared by postingsListInit

	except := roaring.New()
	except.Add(1000)

	dict := &Dictionary{} // dictionary with nil fst reader
	pl, err := dict.PostingsList([]byte("foo"), except, preAllocPl)
	if err != nil {
		t.Fatalf("expected nil error, got: %v", err)
	}

	// Add some more items to the postings list
	// So now our PostingsList have non nil postings and 0 chunkSize
	//
	// Since fstReader is nil, this simulation isn't realistic
	preAllocPl.postings.Add(1000)
	preAllocPl.postings.Add(2000)
	preAllocPl.postings.Add(3000)

	if pl != nil {
		iter := pl.Iterator(false, false, false, nil)
		posting, err := iter.Next()
		if err != nil {
			t.Fatalf("expected nil error, got: %v", err)
		}
		t.Logf("posting: %+v", posting)
	}
}
Running tool: /usr/local/go/bin/go test -timeout 30m -run ^TestDictWithNilFstReader$ github.com/blevesearch/zapx/v16

=== RUN   TestDictWithNilFstReader
--- FAIL: TestDictWithNilFstReader (0.00s)
panic: runtime error: integer divide by zero [recovered]
        panic: runtime error: integer divide by zero

goroutine 34 [running]:
testing.tRunner.func1.2({0x1026d3f00, 0x102810010})
        /usr/local/go/src/testing/testing.go:1632 +0x1bc
testing.tRunner.func1()
        /usr/local/go/src/testing/testing.go:1635 +0x334
panic({0x1026d3f00?, 0x102810010?})
        /usr/local/go/src/runtime/panic.go:785 +0x124
github.com/blevesearch/zapx/v16.(*PostingsIterator).nextDocNumAtOrAfter(0x14000176000, 0x0)
        /Users/shaad/fts2/blevesearch/zapx/posting.go:641 +0x338
github.com/blevesearch/zapx/v16.(*PostingsIterator).nextAtOrAfter(0x14000176000, 0x14000110228?)
        /Users/shaad/fts2/blevesearch/zapx/posting.go:534 +0x24
github.com/blevesearch/zapx/v16.(*PostingsIterator).Next(0x14000150140?)
        /Users/shaad/fts2/blevesearch/zapx/posting.go:523 +0x20
github.com/blevesearch/zapx/v16.TestDictWithNilFstReader(0x14000118820)
        /Users/shaad/fts2/blevesearch/zapx/dict_test.go:299 +0x16c
testing.tRunner(0x14000118820, 0x102702840)
        /usr/local/go/src/testing/testing.go:1690 +0xe4
created by testing.(*T).Run in goroutine 1
        /usr/local/go/src/testing/testing.go:1743 +0x314
FAIL    github.com/blevesearch/zapx/v16 1.280s

For now we can add chunkSize!=0 checks in this method to avoid getting into panic situation.
And in that case, we can treat the postingsList as emptyPostingsList

Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moshaad7 one question still stands though - if we report this error now (better than the panic of course), how is the situation dealt with by the caller(s)?

posting.go Outdated Show resolved Hide resolved
posting.go Outdated Show resolved Hide resolved
The existing implementation of the method
getChunkSize(...) (uint64, error), in some cases,
can return chunkSize==0 and err==nil.

The onus is on the caller to check for such possibility
and handle it properly.
Callers often use the returned chunkSize as a divisor and
a zero chunkSize lead to panic.
see #209

This PR intends to update the method implementation to
always return an error in case the returned chunkSize
value is 0.
That way caller need to only worry about error being non-nil.

Callers which are ok with 0 chunkSize can check the returned
error against ErrChunkSizeZero
@moshaad7
Copy link
Contributor Author

@moshaad7 one question still stands though - if we report this error now (better than the panic of course), how is the situation dealt with by the caller(s)?

As it stands today, the error returned will be trickled down all the way up to user (Query will fail).
Last time when we discussed, we were thinking of returning partial results to user ( From Segments which didn't return error) along with maybe an error map (keyed by segment)

This is more of a product decision, I will need team's consensus before I implement this.

But just to give the glimpse of what changes will be required,
we will have to update IndexSnapshotTermFieldReader to not stop iterating PostingsIterator(s) when encounter an error, rather it should collect the error in the map and continue with the next PostingsIterator.
And instead of returning an error, it will then return partial result and the error map.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

runtime error: integer divide by zero
2 participants