-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-35409: [Python][Docs] Clarify S3FileSystem Credentials chain for EC2 #35312
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
|
@kevinjqliu thanks for the contribution! We have a section about S3FileSystem in the user guide as well: https://arrow.apache.org/docs/dev/python/filesystems.html#s3. Could you have a look to see if that might require an equivalent update? |
@jorisvandenbossche thanks for the review. I've updated the formatting issues and made changes to the S3 section of PTAL |
4f8f703
to
a2748ff
Compare
@jorisvandenbossche fixed a formatting issue. there was something wrong with CI too, hence the force pushes. |
@jorisvandenbossche Would you like to take another look? |
hey, @jorisvandenbossche wanted to ping on this again. thanks! |
|
@github-actions crossbow submit docs-preview |
|
@github-actions crossbow submit preview-docs |
Revision: a2748ff Submitted crossbow builds: ursacomputing/crossbow @ actions-4f25ed291d
|
thanks, @kou I've applied your suggestions. Do the commits need to be squashed? |
@github-actions crossbow submit preview-docs |
Revision: 4cdd0d2 Submitted crossbow builds: ursacomputing/crossbow @ actions-889dfc34d1
|
@kou built the changes locally, I've attached a screenshot above |
Thanks but we have 2 changed pages but the screenshot shows only 1 page. |
The visible change is from |
We use it for API reference: https://arrow.apache.org/docs/python/generated/pyarrow.fs.S3FileSystem.html#pyarrow.fs.S3FileSystem |
I see. I'm following Building a single directory for dev purposes without all the pre-requisites to render the changes for The changes in |
We need pyarrow to build documents in We need to use the full (at least including pyarrow) build approach. |
Could you rebase on main to use #36943 ? |
The C implementation of `DefaultAWSCredentialsProviderChain` not only reads the environment variable when trying to resolve AWS credentials, but also looks at profile config and the EC2 Instance Metadata Service. I want to document this behavior in the documentation. The `s3fs` documentation mentions the specific case for EC2
Co-authored-by: Sutou Kouhei <[email protected]>
@kou I've rebased the branch and was able to build it locally using docker. Updated the description above to render changes to both files. Thanks for guiding me through the process |
@github-actions crossbow submit preview-docs |
Revision: f4b66d7 Submitted crossbow builds: ursacomputing/crossbow @ actions-84183cdf1d
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 334b46d. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them. |
… for EC2 (apache#35312) ### Rationale for this change When resolving AWS credentials on EC2 hosts, the underlying AWS SDK also looks at the EC2 Instance Metadata Service. I want to document this behavior for `pyarrow`. The [`s3fs` documentation](https://s3fs.readthedocs.io/en/latest/#credentials) mention this specific case for EC2. ### What changes are included in this PR? Documentation for the behavior described above. #### Technical Details `S3FileSystem` uses the [`CS3Options.Defaults()`](https://github.com/apache/arrow/blob/5de56928e0fe43f02005552eee058de57ffb2682/python/pyarrow/_s3fs.pyx#L317) option when no credentials are passed into the constructor. It utilizes the [`Aws::Auth::DefaultAWSCredentialsProviderChain`](https://github.com/apache/arrow/blob/1de159d0f6763766c19b183dd309b8757723b43a/cpp/src/arrow/filesystem/s3fs.cc#L213) The C++ implementation of [`DefaultAWSCredentialsProviderChain`](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_default_a_w_s_credentials_provider_chain.html) not only [reads the environment variable](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_environment_a_w_s_credentials_provider.html) when trying to resolve AWS credentials, but also [looks at profile config](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_profile_config_file_a_w_s_credentials_provider.html) and the [EC2 Instance Metadata Service](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_instance_profile_credentials_provider.html). ### Are these changes tested? No, just documentation changes ### Are there any user-facing changes? Yes, changing public documentation * Closes: apache#35409 ### Render Changes Render the changes locally via [Building the doc](https://arrow.apache.org/docs/developers/documentation.html#building-docs): `docs/source/python/filesystems.rst`: ![Screenshot 2023-07-30 at 6 22 02 PM](https://github.com/apache/arrow/assets/9057843/6af053a3-e7a7-4a68-a5b5-02c50e9290c6) `python/pyarrow/_s3fs.pyx`: ![Screenshot 2023-07-31 at 3 31 30 PM](https://github.com/apache/arrow/assets/9057843/d79768be-67ce-46c0-88ed-a833e540f77d) Lead-authored-by: Kevin Liu <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
… for EC2 (apache#35312) ### Rationale for this change When resolving AWS credentials on EC2 hosts, the underlying AWS SDK also looks at the EC2 Instance Metadata Service. I want to document this behavior for `pyarrow`. The [`s3fs` documentation](https://s3fs.readthedocs.io/en/latest/#credentials) mention this specific case for EC2. ### What changes are included in this PR? Documentation for the behavior described above. #### Technical Details `S3FileSystem` uses the [`CS3Options.Defaults()`](https://github.com/apache/arrow/blob/5de56928e0fe43f02005552eee058de57ffb2682/python/pyarrow/_s3fs.pyx#L317) option when no credentials are passed into the constructor. It utilizes the [`Aws::Auth::DefaultAWSCredentialsProviderChain`](https://github.com/apache/arrow/blob/1de159d0f6763766c19b183dd309b8757723b43a/cpp/src/arrow/filesystem/s3fs.cc#L213) The C++ implementation of [`DefaultAWSCredentialsProviderChain`](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_default_a_w_s_credentials_provider_chain.html) not only [reads the environment variable](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_environment_a_w_s_credentials_provider.html) when trying to resolve AWS credentials, but also [looks at profile config](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_profile_config_file_a_w_s_credentials_provider.html) and the [EC2 Instance Metadata Service](https://sdk.amazonaws.com/cpp/api/0.14.3/class_aws_1_1_auth_1_1_instance_profile_credentials_provider.html). ### Are these changes tested? No, just documentation changes ### Are there any user-facing changes? Yes, changing public documentation * Closes: apache#35409 ### Render Changes Render the changes locally via [Building the doc](https://arrow.apache.org/docs/developers/documentation.html#building-docs): `docs/source/python/filesystems.rst`: ![Screenshot 2023-07-30 at 6 22 02 PM](https://github.com/apache/arrow/assets/9057843/6af053a3-e7a7-4a68-a5b5-02c50e9290c6) `python/pyarrow/_s3fs.pyx`: ![Screenshot 2023-07-31 at 3 31 30 PM](https://github.com/apache/arrow/assets/9057843/d79768be-67ce-46c0-88ed-a833e540f77d) Lead-authored-by: Kevin Liu <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
Rationale for this change
When resolving AWS credentials on EC2 hosts, the underlying AWS SDK also looks at the EC2 Instance Metadata Service.
I want to document this behavior for
pyarrow
. Thes3fs
documentation mention this specific case for EC2.What changes are included in this PR?
Documentation for the behavior described above.
Technical Details
S3FileSystem
uses theCS3Options.Defaults()
option when no credentials are passed into the constructor. It utilizes theAws::Auth::DefaultAWSCredentialsProviderChain
The C++ implementation of
DefaultAWSCredentialsProviderChain
not only reads the environment variable when trying to resolve AWS credentials, but also looks at profile config and the EC2 Instance Metadata Service.Are these changes tested?
No, just documentation changes
Are there any user-facing changes?
Yes, changing public documentation
S3FileSystem
behavior in EC2 #35409Render Changes
Render the changes locally via Building the doc:
docs/source/python/filesystems.rst
:python/pyarrow/_s3fs.pyx
: