Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump Kubeflow (1.7.0) and kustomize (5.1.0) #1283

Merged
merged 4 commits into from
Jul 25, 2023

Conversation

supertetelman
Copy link
Collaborator

@supertetelman supertetelman commented Jul 11, 2023

Small bump in Kubeflow version, nothing to exciting.

  • Bump Kustomize v3.2.0 -> v5.1.0
  • Bump Kubeflow v.1.7.0 (Support K8s v1.24 and v1.25)

TODO:
[x] Bump versions
[] Fix K8s v1.26 support
[] Add HTTPS support
[x] Disable hard security requirement on HTTPS

Testing and docs:
[x] Verify Kubeflow install
[] Verify basic functionality of Kubeflow
[] Add additional automated testing
[] Update docs

This release works on the current DeepOps K8s version, but if we bump Kubespray to the latest as per my other open PR we will be jumping to K8s v1.26 which is not officially supported with this release of Kubeflow. In my testing it seemed to somewhat work just fine with the minor API patch included in this PR., but a few Pods were problematic. Will add some more debug to comments in case others want to jump in and get this working.

@supertetelman
Copy link
Collaborator Author

Looks like in K8s v1.26 something is not quite right. Getting the following Pods stuck in CrashLoop:

kubeflow                         admission-webhook-deployment-7d56d4455f-fxqff                     0/1     CrashLoopBackOff   7 (68s ago)     14m
kubeflow                         cache-server-6b44c46d47-bs4lb                                     2/2     Running            0               14m
kubeflow                         centraldashboard-d45f69689-jhmsq                                  2/2     Running            0               14m
kubeflow                         jupyter-web-app-deployment-6d9b7d4f5c-php76                       2/2     Running            0               14m
kubeflow                         katib-controller-7964698977-6snfp                                 0/1     CrashLoopBackOff   7 (2m2s ago)    14m
kubeflow                         katib-db-manager-57474ccbbf-7nj7h                                 1/1     Running            0               14m
kubeflow                         katib-mysql-66c8cdff4f-p782s                                      1/1     Running            0               14m
kubeflow                         katib-ui-57d77d7d75-9kqk5                                         2/2     Running            1 (13m ago)     14m
kubeflow                         kserve-controller-manager-96b896c66-hx44v                         2/2     Running            0               14m
kubeflow                         kserve-models-web-app-9fbcd79f5-27wsd                             2/2     Running            0               14m
kubeflow                         kubeflow-pipelines-profile-controller-6f6bc888df-pgmjh            1/1     Running            0               14m
kubeflow                         metacontroller-0                                                  1/1     Running            0               14m
kubeflow                         metadata-envoy-deployment-7b49bdb748-r8jh4                        1/1     Running            0               14m
kubeflow                         metadata-grpc-deployment-6d744c66bb-tf4bl                         2/2     Running            2 (13m ago)     14m
kubeflow                         metadata-writer-5bfdbf79b7-z5w75                                  2/2     Running            0               14m
kubeflow                         minio-549846c488-b8v9p                                            2/2     Running            0               14m
kubeflow                         ml-pipeline-86d69497fc-zh8mq                                      1/2     CrashLoopBackOff   7 (2m10s ago)   14m
kubeflow                         ml-pipeline-persistenceagent-5789446f9c-6brr6                     2/2     Running            0               14m
kubeflow                         ml-pipeline-scheduledworkflow-fb9fbd76b-wdtwc                     2/2     Running            0               14m
kubeflow                         ml-pipeline-ui-74fcbdddd9-h6c8b                                   2/2     Running            0               14m
kubeflow                         ml-pipeline-viewer-crd-bdf696cb9-jlx6d                            2/2     Running            1 (13m ago)     14m
kubeflow                         ml-pipeline-visualizationserver-845d745b46-p585m                  2/2     Running            0               14m
kubeflow                         mysql-5f968d4688-w5kvn                                            2/2     Running            0               14m
kubeflow                         notebook-controller-deployment-7cdb9d9f7b-f7vnr                   2/2     Running            1 (13m ago)     14m
kubeflow                         profiles-deployment-7cf8b9b794-7xfmw                              2/3     CrashLoopBackOff   7 (118s ago)    14m

@supertetelman
Copy link
Collaborator Author

supertetelman commented Jul 11, 2023

In K8s v1.25, everything seems to be up and running. But seeing this error when I try to launch a NoteBook:
[403] Could not find CSRF cookie XSRF-TOKEN in the request. http://<redacted ip>:30176/jupyter/api/namespaces/kubeflow-deepops-example-com/notebooks

According to the docs, this is because of a new behavior in Kubeflow that envorces https. We will need to either enable https in our kubeflow setup or set the APP_SECURE_COOKIES flag to False universally.

https://github.com/kubeflow/manifests

@supertetelman supertetelman marked this pull request as draft July 11, 2023 06:58
@supertetelman supertetelman marked this pull request as ready for review July 25, 2023 19:43
@supertetelman supertetelman merged commit c06e9ee into NVIDIA:master Jul 25, 2023
3 of 19 checks passed
@supertetelman supertetelman deleted the kubeflow-1.7 branch July 25, 2023 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants