-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various improvements to the GCS VFS. #4008
Conversation
This pull request has been linked to Shortcut Story #26982: Remove polling for bucket/object propagation/deletion from the GCS VFS.. |
|
||
return Status::Ok(); | ||
return {std::move(bucket_name), std::move(object_path)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder, does RVO kick in here or are these moves necessary?
CI fails because because of composing failures. Will investigate. |
It is documented that the GCS operations we do are strongly consistent.
This ensures thread safety and does not impact performance according to documentation.
53a048f
to
11bd496
Compare
Using the old version 1.22.0 will no longer compile.
`absl::string_view` and `variant` alias to the `std` types on sufficiently new C++ versions.
This reverts commit fb9bbe3.
Are we sure we want Remove polling for bucket/object propagation/deletion.? For multi-part uploads we've learned with GCS that they don't guarantee read-after-write, as such I think we need the wait for object to propagate? |
Are you referring to this sentence of the docs? We are not using this API, but good question, it its not explicitly mentioned if object composition is strongly consistent. I submitted feedback on the docs. |
That issue does not report a defect. What's there to fix? A putative performance improvement is not a defect. The GCS docs are incomplete and ambiguous about the relationship between multipart uploads and operations that are strongly consistent. The documentation about strong consistency refers only to a subset of operations, and it does not address issues of mutual consistency between these classes of operations. Therefore, we must assume that multipart uploads and (say) listing are not mutual consistent. In absence of a clear and definitive statement that states the relationship, we must not assume that the system has the properties we might wish it to have. The mere existence of Recommend closing this PR and not returning to until we can measure the performance improvement that might be gained for what looks to be a rather large risk. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed: we need to wait on this for 2.16. Please also put back the ports files.
CompleteMultipartUpload is necessary because the multipart API does not require pre-specifying the total size of the final assembled object (it allows appending up to 10,000 parts before finalizing). See: https://docs.aws.amazon.com/AmazonS3/latest/API/API_CreateMultipartUpload.html |
Let's keep the changes separate, so we can ship the first two:
... |
@eric-hughes-tiledb we are not using the S3-compatible multi-part uploads in GCS. In fact they are not even available in the C++ SDK. What we use is the Compose JSON API, which is officially confirmed to be strongly consistent. |
Marking as draft again, let's do #4031 first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new algorithm for something as basic as a new VFS implementation needs better validation than is present in this PR.
This won't be ready for 2.19.
Closing this for now. The change is a good one but we don't have time to properly test this at the moment. We also have a tracking story for when we are ready to do this work. |
See each commit message for more details. This PR includes a fix for SC-26982
TYPE: IMPROVEMENT
DESC: Various improvements to the Google Cloud Storage VFS