Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 support in Kubeflow Pipelines #3405

Closed
Jeffwan opened this issue Mar 31, 2020 · 61 comments
Closed

S3 support in Kubeflow Pipelines #3405

Jeffwan opened this issue Mar 31, 2020 · 61 comments
Labels
area/backend area/frontend help wanted The community is welcome to contribute. platform/aws status/triaged Whether the issue has been explicitly triaged

Comments

@Jeffwan
Copy link
Member

Jeffwan commented Mar 31, 2020

Create this parent issue to includes all known S3 problems in KFP and I also like to talk about more feature requests in this ticket.

Use cases

Replace Minio with S3 for Argo artifact store and KFP Pipeline store.

[] Manifest changes in Kubeflow/manifest and standalone kubeflow/pipeline/manifest to easily change from minio to S3
[] IRSA - bump minio-sdk-go version to support IRSA
[] IRSA - bump argo workflow version to make sure runner can persist artifact to S3

Relate issues and PRs

UI

[x] KFP UI should be able to read source file in S3, for example

mlpipeline-ui-metadata -> minio://mlpipeline/artifacts/pipeline-A/pipeline-A-run-id/mlpipeline-ui-metadata.tgz

{
  "outputs": [
    {
      "source": "s3://your_bucket/README.md",
      "type": "markdown"
    }
  ]
}

[] KFP UI should be able to read artifact files in S3

mlpipeline-ui-metadata -> s3://mlpipeline/artifacts/pipeline-A/pipeline-A-run-id/mlpipeline-ui-metadata.tgz

{
  "outputs": [
    {
      "source": "s3://your_bucket/README.md",
      "type": "markdown"
    }
  ]
}

[] IRSA - Bump minio-js version to support IRSA

SDK

[x] Use can declare custom S3 artifact location inside the pipeline.
[x] User can apply AWS credentials to pipeline pods to get access to S3.
[x] User can specify service account for pipeline. (IRSA)

More examples

/kind improvement
/area frontend
/area backend

/cc @eterna2 @discordianfish

@k8s-ci-robot
Copy link
Contributor

@Jeffwan: The label(s) kind/improvement cannot be applied, because the repository doesn't have them

In response to this:

Create this parent issue to includes all known S3 problems in KFP and I also like to talk about more feature requests in this ticket.

Use cases

Replace Minio with S3 for Argo artifact store and KFP Pipeline store.

[] Manifest changes in Kubeflow/manifest and standalone kubeflow/pipeline/manifest to easily change from minio to S3
[] IRSA - bump minio-sdk-go version to support IRSA
[] IRSA - bump argo workflow version to make sure runner can persist artifact to S3

Relate issues and PRs

UI

[x] KFP UI should be able to read source file in S3, for example

mlpipeline-ui-metadata -> minio://mlpipeline/artifacts/pipeline-A/pipeline-A-run-id/mlpipeline-ui-metadata.tgz

{
 "outputs": [
   {
     "source": "s3://your_bucket/README.md",
     "type": "markdown"
   }
 ]
}

[] KFP UI should be able to read artifact files in S3

mlpipeline-ui-metadata -> s3://mlpipeline/artifacts/pipeline-A/pipeline-A-run-id/mlpipeline-ui-metadata.tgz

{
 "outputs": [
   {
     "source": "s3://your_bucket/README.md",
     "type": "markdown"
   }
 ]
}

[] IRSA - Bump minio-js version to support IRSA

SDK

[x] Use can declare custom S3 artifact location inside the pipeline.
[x] User can apply AWS credentials to pipeline pods to get access to S3.
[x] User can specify service account for pipeline. (IRSA)

More examples

/kind improvement
/area frontend
/area backend

/cc @eterna2 @discordianfish

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Ark-kun
Copy link
Contributor

Ark-kun commented Apr 1, 2020

SDK: Use can declare custom S3 artifact location inside the pipeline.

This is deprecated. The supported method is to configure the artifact location in the workflow-configmap

There is also an option to use Minio gateway to proxy requests to the actual storage system (Google Cloud Storage or Amazon S3). WDYT?

@Bobgy Bobgy added platform/aws help wanted The community is welcome to contribute. status/triaged Whether the issue has been explicitly triaged labels Apr 2, 2020
@Jeffwan
Copy link
Member Author

Jeffwan commented Apr 4, 2020

@Ark-kun artifact location in workflow-configmap works for us now.
Do you have any docs or code reference on minio gateway?

@Ark-kun
Copy link
Contributor

Ark-kun commented Apr 6, 2020

Do you have any docs or code reference on minio gateway?

Here is the Minio doc: https://docs.minio.io/docs/minio-gateway-for-s3.html

@eterna2
Copy link
Contributor

eterna2 commented Apr 10, 2020

@Jeffwan

Update metadata writer service to support S3 artifacts
Metadata writer service currently hardcode metadata artifacts url to minio. Need to investigate how to infer whether the artifact store endpoint is s3 or minio for metadata service.

def argo_artifact_to_uri(artifact: dict) -> str:

See also

https://kubeflow.slack.com/archives/CE10KS9M4/p1586408517128600

@Ark-kun
Copy link
Contributor

Ark-kun commented Apr 14, 2020

Metadata writer service currently hardcode metadata artifacts url to minio.

The frontend translates this to ReadArtifact backend calls.

Regarding S3:
I think there are two distinct cases:
Artifacts. URIs should be based on the workflow-controller-configmap s3 bucket configuration
mlpipeline-ui-metadata entries that might have any kind of URIs.

@Jeffwan
Copy link
Member Author

Jeffwan commented Apr 16, 2020

@Ark-kun minio gateway seems an extra layer under minio? User need to use minio service with gateway to delegate request to S3. Is that correct? I think that solve some of the requirements and kind of easy but brings extra layer.

mlpipeline-ui-metadata entries that might have any kind of URIs.

Yes. For sure, the problem is it's always minio here, if user want to use S3 to replace minio. Currently, it's not working well.

@discordianfish
Copy link
Member

I'm confused as well. If somebody could outline what steps need to be taken to support s3, I can help implementing them. But reading the comments here leaves me none the wiser.

@rmgogogo
Copy link
Contributor

https://github.com/e2fyi/kubeflow-aws/tree/master/pipelines
would this Kustomization help?

As for GCS backed, it's here:
https://github.com/kubeflow/pipelines/tree/master/manifests/kustomize/sample

@Ark-kun
Copy link
Contributor

Ark-kun commented Apr 24, 2020

@Ark-kun minio gateway seems an extra layer under minio? User need to use minio service with gateway to delegate request to S3. Is that correct? I think that solve some of the requirements and kind of easy but brings extra layer.

Yes. It's served as proxy. The advantage of using it is that all services (UX, Backend, Argo etc) only need to talk to one kind of service - in-cluster Minio. And the configuration (credentials etc) can be specified in one place. @IronPan or @numerology might have more information.

@Jeffwan
Copy link
Member Author

Jeffwan commented Apr 24, 2020

@Ark-kun I checked gateway and run some tests. This is working fine. I can file a PR to aws overlay. The current question is some users don't like to use minio as gateway, do we want to support direct object store access? The PRs listed above aims to solve this problem.

@rmgogogo e2fyi repo can not replace entire minio. Still needs some changes.

@discordianfish
Copy link
Member

Isn't minio suppose to be compatible to s3? I don't understand the need for the gateway.

How do we authenticate clients connecting to the minio gateway? Does that require an additional layer of authentication? Or is this enforced by istio on a networking level?

Not convinced this is better than granting individual components IAM permissions.

@Jeffwan
Copy link
Member Author

Jeffwan commented Apr 28, 2020

@discordianfish Gateway is kind of easy in current manifest structure. KFP just need S3 permission and minio will becomes a proxy to S3. There's no need to make code changes everywhere to check protocol. If you see above PRs, It's kind of painful now. We also need to give credential or attach roles in every KFP component talks to S3.

I agree this won't meet everyone's need, we definitely need low level object storage configuration.

@discordianfish
Copy link
Member

@Jeffwan What I don't understand is why we need to distinguish between minio and s3 at all, instead of just tracking the bucket name and the endpoint. The minio client would be configured with either auth credentials or fall back to the instance role. That should be the same config for all components, no?

But okay, if people want to go down the gateway route: How do all components that need s3/minio access authenticate against the gateway? Not using auth here would be a no go for us.

@Jeffwan
Copy link
Member Author

Jeffwan commented Apr 29, 2020

@discordianfish

What I don't understand is why we need to distinguish between minio and s3 at all, instead of just tracking the bucket name and the endpoint.

I see. It's not consistency across all components. KFP has lots of references has fixed protocol like minio:// which won't work for S3.

But okay, if people want to go down the gateway route: How do all components that need s3/minio access authenticate against the gateway? Not using auth here would be a no go for us.

The only difference between minio and minio gateway with S3 is, user need to give credential with s3 policy to minio pod. instance profile will work as well. I have not checked with IRSA or Kube2IAM for fine grain control yet.

@eterna2
Copy link
Contributor

eterna2 commented Apr 29, 2020

Just want to add a note.

One main reason why we moved to s3 instead of minio gateway (this was our original deployment) is because tensorboard doesn't work with minio.

It only support gcs and s3. It is using boto3 instead of minio the last time I checked (maybe 6 months ago?).

Other than that I think minio gateway is fine as long as the throughput is fine. But the concern is that u can't have fine grain rbac control with minio gateway.

And it is confusing to our data scientists when they have to mix between s3 and minio (i.e. s3 for our data, minio for data generated within kfp).

@discordianfish
Copy link
Member

I see. It's not consistency across all components. KFP has lots of references has fixed protocol like minio:// which won't work for S3.

Yes, but why? The protocol part in that url is just for the client to know where to connect so. So for minio and s3, I suggest we always use the endpoint, bucket name and path to refer to an object. It shouldn't matter for the client whether this is s3 or minio. Does that make sense?

The only difference between minio and minio gateway with S3 is, user need to give credential with s3 policy to minio pod. instance profile will work as well. I have not checked with IRSA or Kube2IAM for fine grain control yet.

I understand that, but we need to authenticate access from kubeflow components to the minio gateway. If I understand you correctly, with minion gw everything that is able to reach minion will have access to all s3 content without auth. That's not acceptable for us.

@Jeffwan
Copy link
Member Author

Jeffwan commented May 1, 2020

I see. It's not consistency across all components. KFP has lots of references has fixed protocol like minio:// which won't work for S3.

Yes, but why? The protocol part in that url is just for the client to know where to connect so. So for minio and s3, I suggest we always use the endpoint, bucket name and path to refer to an object. It shouldn't matter for the client whether this is s3 or minio. Does that make sense?

https://github.com/kubeflow/pipelines/pull/3531/files#diff-4e5a9c52bbed342532f527bae3aae002R15-R20
This is one example, lots of codes rely on protocol to have custom logic. I get your point and that would be ideal case, we don't alway have endpoint passed with artifact url everywhere. If there's a refactor to have a object will endpoint, bucket, path. Then all the customization can be changed to reply on endpoint. It will work but have more efforts.

The only difference between minio and minio gateway with S3 is, user need to give credential with s3 policy to minio pod. instance profile will work as well. I have not checked with IRSA or Kube2IAM for fine grain control yet.

I understand that, but we need to authenticate access from kubeflow components to the minio gateway. If I understand you correctly, with minion gw everything that is able to reach minion will have access to all s3 content without auth. That's not acceptable for us.

Right, that's what @eterna2 mention as well. Fine grain control doesn't work with minio gateway.

@HassanOuda
Copy link

HassanOuda commented Mar 11, 2021

Also, why is this line hardcoding S3 endpoint as the amazon endpoint? We have a different S3 instance not in Amazon and minio endpoint doesn't work.

return re.search('^.*s3.*amazonaws.com.*$', endpoint)

and here:

return !!endpoint.match(/s3.{0,}\.amazonaws\.com\.?.{0,}/i);

lastly:

return !!endpoint.match(/s3.{0,}\.amazonaws\.com\.?.{0,}/i);

@aaron-arellano
Copy link

aaron-arellano commented May 7, 2021

Is assume-web-identity-role to use IAM Role for Service account still an open issue? The thread is not clear on whether this works or not. I am looking to use IAM Role for service account to connect ML Pipeline and Argo artifacts to S3 using this annotation eks.amazonaws.com/role-arn: <your-arn-role> @Jeffwan

@karlschriek
Copy link

Doesn't work yet. Minio version used by pipelines is currently still sitting at github.com/minio/minio-go v6.0.14+incompatible. I think this needs to be merged: #3549

@karlschriek
Copy link

Also this issue also needs to be resolved: minio/minio-js#841. As far as I can tell from piecing together the various PRs and issues on this topic we need both minio-go and minio-js to implement this and then we need the versions to be bumped on KFP's side

@aaron-arellano
Copy link

aaron-arellano commented May 14, 2021

thanks for the response. hopefully there is resolution to this soon. I know a lot of people would love to use IRSA for S3 to avid using credentials or third party K8s IAM providers...

@Bobgy
Copy link
Contributor

Bobgy commented May 17, 2021

Hi everyone, for KFP v2 compatible mode, we've switched to use https://gocloud.dev/ to access S3. Can someone help me verify whether it supports your use-cases especially IRSA and how?
You can find more about the design in bit.ly/kfp-v2-compatible

@karlschriek
Copy link

Hey @Bobgy, I could help with this. Could you give me some more info on what you would like to verify exactly? I guess this will replace the Minio Go and JS clients that are currently in use?

The link bit.ly/kfp-v2-compatible gives me a 404 error.

@Bobgy
Copy link
Contributor

Bobgy commented May 17, 2021

Thanks @karlschriek! Yes, we'd like to replace MinIO Go with Go CDK (also KFP tasks in pipelines will use Go CDK to upload/download from storage). So it'll be helpful to verify whether Go CDK (as an alternative client for S3) can support the IRSA you are talking about.

Re the 404 error, sorry I made a mistake, fixed the link.

@karlschriek
Copy link

Ok, it looks promising at least: google/go-cloud#2773.

IRSA works by attaching annotations to the ServiceAccounts that get attached to Pods. So all we would need to do is attach the right annotations to the ServiceAccounts for the pipelines api etc, as well as whatever sa is used for the pipeline jobs.

I'll provide you with some small examples and also try out the Go SDK.

@karlschriek
Copy link

Is there already an issue that tracks this? I guess the current issue is probably not the right place to continue this discussion.

@Bobgy
Copy link
Contributor

Bobgy commented May 17, 2021

@karlschriek WDUT about #5656?

@RakeshRaj97
Copy link

RakeshRaj97 commented Jun 15, 2021

I'm struggling to access my object storage bucket created using rook-ceph (s3) and would like to use this as my default artifact and metadata store instead of MinIO. Can anyone please share me what all files to be modified to use MinIO as s3 interface?

I've tried updating workflow-controller-configmap to use my object store

apiVersion: v1
data:
  artifactRepository: |
    archiveLogs: true
    s3:
      endpoint: "rook-ceph-rgw-my-store.rook-ceph.svc.cluster.local:80"
      bucket: "ceph-bkt-xxxx-xxxxx"
      keyFormat: "artifacts/{{workflow.name}}/{{pod.name}}"
      # insecure will disable TLS. Primarily used for minio installs not configured with TLS
      insecure: true
      accessKeySecret:
        name: mlpipeline-minio-artifact
        key: accesskey
      secretKeySecret:
        name: mlpipeline-minio-artifact
        key: secretkey
  containerRuntimeExecutor: docker

Also replace accesskey and secretkey in mlpipeline-minio-artifact secret

Any clue on this will be really helpful.

@vinayan3
Copy link

@Jeffwan Did you have a PR or an example for using minio as a gateway to s3? I've been standing up Kubeflow on AWS and I'm a bit concerned about the number of places the patches that need to go in for S3. Instead of patching all these deployments I'd prefer to just patch the minio components and then be done with it.

@Jeffwan
Copy link
Member Author

Jeffwan commented Sep 4, 2021

@Jeffwan Did you have a PR or an example for using minio as a gateway to s3? I've been standing up Kubeflow on AWS and I'm a bit concerned about the number of places the patches that need to go in for S3. Instead of patching all these deployments I'd prefer to just patch the minio components and then be done with it.

@vinayan3 Have you checked in this https://docs.min.io/docs/minio-gateway-for-s3.html? You can make changes in minio manifests to achieve that. minio gateway works with S3 out of box.

@MaximumQuasar
Copy link

Hi,

Currently is the assume-role-identity sts (temporary access) supported for the workflow to upload into s3 and is there any reference docs for the same?

By temporary access i mean with access key, secret key & security Token.

@xwt-ml
Copy link

xwt-ml commented May 10, 2022

Is there a time plan, when and in which release will this following feature be supported? @Jeffwan
UI
KFP UI should be able to read artifact files in S3
mlpipeline-ui-metadata -> s3://mlpipeline/artifacts/pipeline-A/pipeline-A-run-id/mlpipeline-ui-metadata.tgz

@surajkota
Copy link
Contributor

Hi everyone, please see the proposal/design for using IRSA/IAM roles to access S3 on #8502 and let us know your thoughts.

For anyone looking to use S3 with Kubeflow pipelines(using IAM user credentials), please refer to the latest AWS distribution of Kubeflow docs - https://awslabs.github.io/kubeflow-manifests/

@thesuperzapper
Copy link
Member

Hey all, I just wanted to share that deployKF uses S3 directly (no minio gateway), which also means that it comes with support for IRSA!

It's going to be documented better soon, but these are the values you need to configure to use S3, happy to help anyone who has trouble getting it working.

If you want, you can also read a bit more about deployKF in this comment: kubeflow/manifests#2451 (comment)

@rimolive
Copy link
Member

Closing this issue. Looks like this was already implemented.

/close

Copy link

@rimolive: Closing this issue.

In response to this:

Closing this issue. Looks like this was already implemented.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/backend area/frontend help wanted The community is welcome to contribute. platform/aws status/triaged Whether the issue has been explicitly triaged
Projects
Status: Closed
Development

No branches or pull requests