Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Community feedback on Kubeflow complexity #2451

Closed
brsolomon-deloitte opened this issue Apr 26, 2023 · 10 comments
Closed

Community feedback on Kubeflow complexity #2451

brsolomon-deloitte opened this issue Apr 26, 2023 · 10 comments

Comments

@brsolomon-deloitte
Copy link

I'd like to offer some constructive feedback from the perspective of a team that has installed Kubeflow to multiple k8s clusters.

In short, Kubeflow is overly heavy-handed in that it bundles in and tightly couples with multiple third-party components rather than focusing on the deployment of the core Kubeflow app itself. It's truly the only app I can think of that virtually necessitates installation into its own dedicated Kubernetes cluster because of the assumptions it makes about what is or is not already present in the target cluster.

Do I want to install an entire distribution of cert-manager? No, we, like many cluster admins, already have that installed and running with issuer configuration that works across application Ingresses. Same thing with Istio and Dex: these components (1) should not be part of the bundled Kubeflow installation and (2) should at the very least be easily swappable for other components that perform the same function.

The response from Kubeflow maintainers in kubeflow/kubeflow#3173 and other places seems to be that Kubeflow is complex enough to warrant and require bundling in heavyweight dependencies such as those above. However, look around the OSS landscape and you'll see plenty of other projects that are complex but still manage relatively lightweight installations: for example, the Elastic Stack supports OIDC, RBAC, and many other features but makes these configurable through native configuration and an operator pattern rather than forcing a hard dependency on specific providers of those functions. I'm not convinced it's simply impossible at this point for Kubeflow to do the same.

I realize it's out of the control of kubeflow/manifests entirely, but this problem is only exacerbated downstream where the same mentality seems to rule - for example, in awslabs/kubeflow-manifests, where the install script hardcodes a helm install of aws-load-balancer-controller.

Constructive feedback: in future major version releases of Kubeflow, focus on offering a more lightweight Helm (or other IaC/CaC) installation of Kubeflow that offers the components of Kubeflow that your users need, and nothing beyond.

@jbottum
Copy link

jbottum commented Apr 26, 2023

@DomFleischmann @DnPlas @kimwnasptd @annajung @thesuperzapper I am interested if you have any input on this ? @brsolomon-deloitte thanks for this input. I believe Mathew was leading an effort, which has a thread here, https://kubeflow.slack.com/archives/C01EY3L525N/p1672935974905539. We are planning 1.8 now, does your team have any cycles to help on this, perhaps join the manifest and release teams ?

@jbottum
Copy link

jbottum commented Apr 26, 2023

/kind feature
/priority p2

@thesuperzapper
Copy link
Member

@jbottum @brsolomon-deloitte While I am hesitant to announce what I have been working on before it's fully ready, I think you and others deserve to see a taste of what installing Kubeflow (and more) will look like in the VERY near future.

It's called deployKF, and you can read more about it on the website: https://www.deploykf.org/

WARNING

  • I really want to stress please DO NOT attempt to install deployKF until I push the 0.1.0 release, there will be significant breaking changes in the next few days.
  • I expect 0.1.0 will be cut within the next 2 weeks, so I can only ask for your patience until then.

@thesuperzapper
Copy link
Member

Hi everyone, I am very excited to share the first release of deployKF!

Try it out here: https://github.com/deployKF/deployKF

Why you should care.

With deployKF, it's WAY easier to configure things, because everything has a config value (no more manual patches):

We have patched LOTS of security issues, many of which can't be easily fixed in the upstream manifests:

  • All secrets are randomly generated at install time, rather than being hardcoded in manifests.
  • Reduced attack vectors compared to Kubeflow Manifests, particularly in Istio configurations.
  • Uses Istio with distroless images by default.
  • MinIO (or S3) access keys are isolated to each profile, and scoped to the minimum required permissions.
  • Supports using AWS IRSA instead of S3 access keys.
  • (some redacted stuff)

The whole design of deployKF is focused on creating enterprise-ready ML Platforms which are actually possible to maintain in the long term, we have:

  • In-place upgrades (bring your old custom values files to new versions, we aim to not break backward compatibility)
  • Changes to configs/secrets will automatically restart the affected services
  • GitOps with the industry standard, ArgoCD (note, support for other CD tools will come, and are even making a native installer that does not require GitOps)

deployKF is more than just Kubeflow, we aim to support all leading open ML & Data tools, for example:

❤️ deployKF has been a long labor of love, with work spanning back at least 3 years! I am so excited to see how everyone uses it, and how the industry reacts to the best ML Platform being a free and open-source tool!

@ruckc
Copy link

ruckc commented Aug 29, 2023

@thesuperzapper - I appreciate your attempt to help make kubeflow easier to deploy... but it doesn't address the concerns of the @brsolomon-deloitte nor myself.

My organization has a standard kubernetes clusters are built to, with requirements on ingress controller, storage, key management. The current complexity of kubeflow, which appears to be on-par with gitlab's deployment, is impossible for us to leverage, mainly due to it baked in cert-manager and istio.

Ideally, i'd like a kubeflow deployment option that is a helm chart, that can be configured with a values.yaml, and deployed/upgraded as such.

@thesuperzapper
Copy link
Member

@ruckc I would love to understand how we can make deployKF meet your needs, but based on your comment, I think it probably already does:

  1. deployKF uses helm-like values (learn more here), and supports in-place upgrades
  2. deployKF allows you to easily bring your own Isito and Cert-Manager
  3. Depending on what you mean by "key management" deployKF probably supports it:

PS: Given what deployKF is (a full ML Platform running on Kubernetes), I think it's reasonable to expect some complexity. Its goal after all is to run an ML-focused cloud inside your Kubernetes cluster!


To disable the embedded Isito and Cert-Manager, you simply set the following values to false:

If you really want to bring your own "istio gateway deployment", you can do this by setting deploykf_core.deploykf_istio_gateway.charts.istioGateway.enabled to false:

  • NOTE: this is not necessary for most users, unless you have some seriously restrictive organization policies!
  • NOTE: this is for the Gateway Deployment/Pods, not the Gateway CRD, which is always managed by deployKF

Either way, once you have istio and cert-manager, you can follow the Expose deployKF Gateway and configure HTTPS guide to expose the gateway publicly (or on your internal network), with a valid SSL certificate.

@juliusvonkohout
Copy link
Member

@thesuperzapper - I appreciate your attempt to help make kubeflow easier to deploy... but it doesn't address the concerns of the @brsolomon-deloitte nor myself.

My organization has a standard kubernetes clusters are built to, with requirements on ingress controller, storage, key management. The current complexity of kubeflow, which appears to be on-par with gitlab's deployment, is impossible for us to leverage, mainly due to it baked in cert-manager and istio.

Ideally, i'd like a kubeflow deployment option that is a helm chart, that can be configured with a values.yaml, and deployed/upgraded as such.

There are various external distributions that provide helm charts. E.g. https://github.com/kromanow94/kubeflow-manifests/tree/helmcharts and it is also possible to deploy with kustomize and comment out e.g. cert-manager. See also #2717

@doncorsean
Copy link

I'd like to offer some constructive feedback from the perspective of a team that has installed Kubeflow to multiple k8s clusters.

In short, Kubeflow is overly heavy-handed in that it bundles in and tightly couples with multiple third-party components rather than focusing on the deployment of the core Kubeflow app itself. It's truly the only app I can think of that virtually necessitates installation into its own dedicated Kubernetes cluster because of the assumptions it makes about what is or is not already present in the target cluster.

Do I want to install an entire distribution of cert-manager? No, we, like many cluster admins, already have that installed and running with issuer configuration that works across application Ingresses. Same thing with Istio and Dex: these components (1) should not be part of the bundled Kubeflow installation and (2) should at the very least be easily swappable for other components that perform the same function.

The response from Kubeflow maintainers in kubeflow/kubeflow#3173 and other places seems to be that Kubeflow is complex enough to warrant and require bundling in heavyweight dependencies such as those above. However, look around the OSS landscape and you'll see plenty of other projects that are complex but still manage relatively lightweight installations: for example, the Elastic Stack supports OIDC, RBAC, and many other features but makes these configurable through native configuration and an operator pattern rather than forcing a hard dependency on specific providers of those functions. I'm not convinced it's simply impossible at this point for Kubeflow to do the same.

I realize it's out of the control of kubeflow/manifests entirely, but this problem is only exacerbated downstream where the same mentality seems to rule - for example, in awslabs/kubeflow-manifests, where the install script hardcodes a helm install of aws-load-balancer-controller.

Constructive feedback: in future major version releases of Kubeflow, focus on offering a more lightweight Helm (or other IaC/CaC) installation of Kubeflow that offers the components of Kubeflow that your users need, and nothing beyond.

👏👏

@juliusvonkohout
Copy link
Member

@doncorsean please do not resurrect this issue. There is #2730 and if something is missing, please create precise and focused separate issues.

@doncorsean
Copy link

doncorsean commented Jul 8, 2024

Agreed @juliusvonkohout, thanks for directing me to #2730. I've forked the repo @kromanow94 has been working on, deployed it into my EKS cluster via ArgoCD (with independently managed cert-manager, istiod, istio-ingress & argo-workflows) and looking into any contributions I can make to the great work @kromanow94 and his team has done to help move things forward. For anyone looking for more information review #2730 , it has the latest updates from @kromanow94 and references to his fork

@kromanow94 kromanow94 mentioned this issue Jul 8, 2024
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants