Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Use DH S3Instructions to build Iceberg AWS clients #6113
base: main
Are you sure you want to change the base?
feat: Use DH S3Instructions to build Iceberg AWS clients #6113
Changes from all commits
8066110
4dc52bd
9c13e93
de78d8a
28f7643
c8ecd2c
3804ef9
385ac25
b909ff3
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so weird that spotless let this one pass.
Also, typo in
S3Instructions
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the glue, kms and dynamodb clients creation covered under any tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No; you could argue we should prefer to throw unsupported for DynamoDB / KMS, but it was easy enough construct those clients if some Iceberg catalog happens to use them (I'm not sure under what circumstances, but it is part of that API...).
I don't know if we've done any Glue testing. https://docs.localstack.cloud/user-guide/aws/glue/ might be possible with the "pro" version, not sure what that entails. https://docs.localstack.cloud/user-guide/aws/glue/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, the implementation does seem straightforward and the internal methods are being touched in the S3 code, so I think that part should be okay.
The only thing I am slightly worried about is preloading in case of kms and dynamo clients.
You have a comment stating glue client is preloaded, is it possible to verify the same for kms and dynamo too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done some manual testing of the Glue Catalog with the DeephavenAwsClientFactory, and it works. It was a good exercise though, because I realized that there is a new S3Client built for ever Table load (at least for the Glue catalog), so we can't immediately invoke the cleanup Runnable. After a quick change, it worked.
The only caller I see of kms() is LakeFormationAwsClientFactory - it's possible that some private Catalogs use it, but I'm not sure. It seems like this factory has something do do with Role ARN, and it may not actually be in use anymore since I think Iceberg has expanded how configurable AWS is since the creation of it.
The only caller of dynamodb() is DynamoDbCatalog; it may be possible to test it out with Localstack since it looks to be freely available, but it's just a matter of how much time do we want to spend trying? https://docs.localstack.cloud/user-guide/aws/dynamodb/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking into this.
At this point, maybe just add a TODO comment for kms and dynamodb, for testing it in future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I had to do something similar for the
S3Request
objects, Ryan suggested usingCleanupReferenceProcessor
. You can check that too, if that has any advantages.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we had to deprecate these?
They might still give us coverage on a lot of smaller cases that Larry tested.
Is it deprecated in a sense that new tests should not be added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mostly that I think we should not add new tests here, and work to migrating them as mentioned in
IcebergToolsTest
. @lbooker42 has so far owned this layer, but it should be relatively easy to migrate to db_resource, and then we get the benefit of:I see db_resource as mainly a way to test out how well we can interoperate with Iceberg that has been written via different processes (pyiceberg, spark, etc).
For more thorough testing (once we have our own writing support) we should be able to extend
SqliteCatalogBase
(or create further specialized tests that look similar to it), which can work with any warehouse - currently the same logic is tested out via local disk, minio, and localstack.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, these new catalog testing code is the more comprehensive way for all future tests. Once we can migrate these tests so we don't lose any coverage, we should remove these as well as the IcebergTestCatalog class.
cc: @lbooker42