Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test failure: TestMetadataLockContextCancel #312

Open
morgo opened this issue Jun 27, 2024 · 4 comments · May be fixed by #314
Open

test failure: TestMetadataLockContextCancel #312

morgo opened this issue Jun 27, 2024 · 4 comments · May be fixed by #314
Assignees

Comments

@morgo
Copy link
Collaborator

morgo commented Jun 27, 2024

This looks to have been recently introduced:

2024-06-27T20:07:49.4809983Z --- FAIL: TestMetadataLockContextCancel (0.00s)
2024-06-27T20:07:49.4810995Z panic: runtime error: invalid memory address or nil pointer dereference [recovered]
2024-06-27T20:07:49.4812203Z 	panic: runtime error: invalid memory address or nil pointer dereference
2024-06-27T20:07:49.4813253Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x9fefea]
2024-06-27T20:07:49.4813911Z 
2024-06-27T20:07:49.4814088Z goroutine 74 [running]:
2024-06-27T20:07:49.4814632Z testing.tRunner.func1.2({0xa4c1a0, 0xede880})
2024-06-27T20:07:49.4815593Z 	/opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:1631 +0x3f7
2024-06-27T20:07:49.4816417Z testing.tRunner.func1()
2024-06-27T20:07:49.4817156Z 	/opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:1634 +0x6b6
2024-06-27T20:07:49.4817981Z panic({0xa4c1a0?, 0xede880?})
2024-06-27T20:07:49.4818765Z 	/opt/hostedtoolcache/go/1.22.4/x64/src/runtime/panic.go:770 +0x132
2024-06-27T20:07:49.4819806Z github.com/cashapp/spirit/pkg/dbconn.(*MetadataLock).Close(...)
2024-06-27T20:07:49.4820886Z 	/home/runner/work/spirit/spirit/pkg/dbconn/metadatalock.go:102
2024-06-27T20:07:49.4822157Z github.com/cashapp/spirit/pkg/dbconn.TestMetadataLockContextCancel(0xc0000b0340)
2024-06-27T20:07:49.4823544Z 	/home/runner/work/spirit/spirit/pkg/dbconn/metadatalock_test.go:52 +0x3ea
2024-06-27T20:07:49.4824487Z testing.tRunner(0xc0000b0340, 0xb02c68)
2024-06-27T20:07:49.4825373Z 	/opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:1689 +0x21f
2024-06-27T20:07:49.4826254Z created by testing.(*T).Run in goroutine 1
2024-06-27T20:07:49.4833387Z 	/opt/hostedtoolcache/go/1.22.4/x64/src/testing/testing.go:1742 +0x826
2024-06-27T20:07:49.4834423Z FAIL	github.com/cashapp/spirit/pkg/dbconn	4.082s

We might need to extend timeouts etc to make sure it doesn't occur.

@jayjanssen jayjanssen self-assigned this Jun 28, 2024
@jayjanssen
Copy link
Collaborator

This is pretty weird. I'm struggling to understand how the test is getting into this situation. This Close() call in the test is winding up with a situation where the cancel func in Close()is nil.

BUT, the cancel attribute in the mdl struct is set in the NewMetadataLock call, which happened before the Close() call.

The only way this makes sense to me is if the context.WithCancel(ctx) can return a nil cancel func. I thought maybe if the parent context got canceled before the child context was created it might return a nil cancel func, but that does not seem to be the case.

Without that understanding how we got here, I think the best I can do is refactor slightly to let Close() make a best effort attempt to clean up the MetadataLock even if the cancel is nil.

@jayjanssen jayjanssen linked a pull request Jun 28, 2024 that will close this issue
@jayjanssen
Copy link
Collaborator

@morgo I have some possible changes, but want to get your opinion on how this happened in the first place.

@morgo morgo changed the title race in tests race in test: TestMetadataLockContextCancel Jun 28, 2024
@morgo morgo changed the title race in test: TestMetadataLockContextCancel test failure: TestMetadataLockContextCancel Jun 28, 2024
@morgo
Copy link
Collaborator Author

morgo commented Jun 28, 2024

@morgo I have some possible changes, but want to get your opinion on how this happened in the first place.

I'm not sure I can figure it out either :( I'll wait until I catch it again and try to investigate more.

@morgo
Copy link
Collaborator Author

morgo commented Sep 4, 2024

This is a better catch:

2024-09-04T14:09:20.5556212Z time="2024-09-04T14:09:20Z" level=info msg="attempting to acquire metadata lock: test-cancel"
2024-09-04T14:09:20.5592246Z     metadatalock_test.go:50: 
2024-09-04T14:09:20.5593830Z         	Error Trace:	/home/runner/work/spirit/spirit/pkg/dbconn/metadatalock_test.go:50
2024-09-04T14:09:20.5595417Z         	Error:      	Received unexpected error:
2024-09-04T14:09:20.5597563Z         	            	could not acquire metadata lock: test-cancel, lock is held by another connection
2024-09-04T14:09:20.5598866Z         	Test:       	TestMetadataLockContextCancel
2024-09-04T14:09:20.5599936Z     metadatalock_test.go:51: 
2024-09-04T14:09:20.5601383Z         	Error Trace:	/home/runner/work/spirit/spirit/pkg/dbconn/metadatalock_test.go:51
2024-09-04T14:09:20.5605448Z         	Error:      	Expected value not to be nil.
2024-09-04T14:09:20.5606941Z         	Test:       	TestMetadataLockContextCancel
2024-09-04T14:09:20.5608032Z --- FAIL: TestMetadataLockContextCancel (0.01s)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants