-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-43535: [C++] Support the AWS S3 SSE-C encryption #43601
base: main
Are you sure you want to change the base?
Conversation
|
7e1f200
to
1134b07
Compare
@pitrou @mapleFU @felipecrv can you help to review the PR, Thanks |
I took a quick look and here are some assorted thoughts:
|
Also, for the other readers wondering, I've found a useful (though possibly outdated?) comparison of S3 encryption options here: |
@pitrou Thanks for your review, for #1, i would try to add ut for the change, for #2, yes, actually ONLY the Key is needed, that's why i only expose one SetSSECKey function, and three Get function, and private the new added datamember. |
@pitrou I've submitted another commit to only expose the ssec-key in S3Options, and then calculate the MD5 every time on the fly, can you help to review whether there are other comments? I would try to add the ut if there is no interface change required, Thanks |
bb2e024
to
539fa7c
Compare
@pitrou Can you help to review the PR again?Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a careful around this weekend
/// \param expect_input_key_size, default 32 | ||
/// \return true if the decode and calculate MD5 success, otherwise return false | ||
ARROW_EXPORT | ||
bool CalculateSSECKeyMD5(const std::string& base64_encoded_key, std::string& md5_result, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind change to Result<std::string>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I would try to refine the return value for the function.
// Decode the Base64-encoded key to get the raw binary key | ||
Aws::Utils::ByteBuffer rawKey = | ||
Aws::Utils::HashingUtils::Base64Decode(base64_encoded_key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do this handles error if base64_encoded_key
is not valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From https://github.com/aws/aws-sdk-cpp/blob/ac2da09e6930e3988d1289717e2df5d4b7408f17/src/aws-cpp-sdk-core/source/utils/base64/Base64.cpp#L91-L121, seems this aws util API didn't validate the input properly,but directly calculate the output size and allocate the buffer, so i add the check to ensure every character is the valid character here.
Meanwhile, I also add the related Unit test for the CalculateSSECKeyMD5 to test the different input value.
|
||
// the key needs to be // 256 bits(32 bytes)according to | ||
// https://docs.aws.amazon.com/AmazonS3/latest/userguide/ServerSideEncryptionCustomerKeys.html#specifying-s3-c-encryption | ||
if (rawKey.GetLength() != expect_input_key_size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would this not equal to 256?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the expect_input_key_size is exposed via S3Options, so it's possible for this value to be set as any length.
de05328
to
8649388
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you fix style by pre-commit run --show-diff-on-failure --color=always --all
?
ASSERT_FALSE(CalculateSSECKeyMD5("").ok()); // invalid base64 | ||
ASSERT_FALSE(CalculateSSECKeyMD5("%^H").ok()); // invalid base64 | ||
ASSERT_FALSE(CalculateSSECKeyMD5("INVALID").ok()); // invalid base64 | ||
ASSERT_FALSE(CalculateSSECKeyMD5("MTIzNDU2Nzg5").ok()); // invalid, the input key size not match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use ASSERT_RAISES_WITH_MESSAGES()
instead?
@@ -443,6 +443,7 @@ class TestS3FS : public S3TestMixin { | |||
// Most tests will create buckets | |||
options_.allow_bucket_creation = true; | |||
options_.allow_bucket_deletion = true; | |||
options_.sse_customer_key = "1WH9aTJ0+Tn0NLbTMHZn9aCW3Li3ViAdBsoIldPCREw="; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to cover the workflow with sse_customer_key nonempty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we always set sse_custom_key
, we can't test no sse_custom_key
case.
Can we add a test that uses sse_custom_key
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, and ideally the SSE-C encryption test would write a file with the encryption key, and check that trying to read it with another key (or with no key at all) fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Could you update the PR description to reflect the current implementation? |
I'll wait for re-review from pitrou in this week. |
…face to set the ca for windows
…add the SSEC related function after 1.9.201
… in linux as the minio with mingw could NOT work well
…the host could success during the TLS handshake
50627c2
to
3fce2f5
Compare
class TestS3FSHTTP : public TestS3FS { | ||
public: | ||
void SetUp() override { | ||
enable_tls_if_supported_ = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this necessary? This looks like a weird workaround, can there at least be an explanation in comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've temporarily disabled this workaround here: a69961b
We'll see how CI goes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems it can occur sporadically even in non-stress tests:
https://github.com/ursacomputing/crossbow/actions/runs/11388191999/job/31684559075#step:7:3791
[ RUN ] TestS3FS.OpenOutputStreamMetadata
/arrow/cpp/src/arrow/filesystem/s3fs_test.cc:593: Failure
Failed
'output->Close()' failed with IOError: When completing multiple part upload for key 'mdfile1' in bucket 'bucket': AWS Error NO_SUCH_UPLOAD during CompleteMultipartUpload operation: The specified multipart upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed.
/arrow/cpp/src/arrow/filesystem/s3fs.cc:1830 CleanupIfFailed(FinishPartUploadAfterFlush())
[ FAILED ] TestS3FS.OpenOutputStreamMetadata (215 ms)
@kou Do you know if we were seeing this error before switching to HTTPS for Minio-based tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may have to do with forcing "localhost" for TLS certificate check purposes: in some cases, a port number might be reused?
What we should perhaps do is:
- switch all tests to HTTP except the SSE customer key tests
- use "localhost" only for HTTPS tests, otherwise use
GetListenAddress()
without a host argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pitrou Fine, i've done the related change in the latest commit. please help to review again.
@github-actions crossbow submit -g cpp |
Revision: a69961b Submitted crossbow builds: ursacomputing/crossbow @ actions-cc413ca29c |
…r http mionio server, rollback to use the random host ip
@pitrou is it possible to help review the change again, Thanks. |
Rationale for this change
server-side encryption with customer-provided keys is an important security feature for aws s3, it's useful when user want to manager the encryption key themselves, say, they don't want the data to be exposed to the aws system admin, and ensure the object is safe even the ACCESS_KEY and SECRET_KEY is somehow leaked.
Some comparison of S3 encryption options :
https://www.linkedin.com/pulse/delusion-s3-encryption-benefits-ravi-ivaturi/
What changes are included in this PR?
Add the sse_customer_key member for S3Options to support server-side encryption with customer-provided keys (SSE-C keys).
Add the tls_ca_file_path, tls_ca_dir_path and tls_verify_certificates members for S3Options.
Refine the unit test to start the minio server with self-signed certificate on linux platform, so the unit test could cover the https case on linux, and http case on non-linux platform.
Are these changes tested?
Yes
Are there any user-facing changes?
yes, the new member sse_customer_key tls_ca_file_path, tls_ca_dir_path and tls_verify_certificates added for S3Options