Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

attempt to commonize Kubeflow jobs Multikueue support methods #2795

Conversation

mszadkow
Copy link
Contributor

@mszadkow mszadkow commented Aug 7, 2024

/kind cleanup

What type of PR is this?

What this PR does / why we need it:

Attempt to unify code so it's not repeated for each kubeflow job multikueue adapter

Which issue(s) this PR fixes:

Relates #2552

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 7, 2024
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 7, 2024
Copy link

netlify bot commented Aug 7, 2024

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 5ed0f3b
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/66c36a3de9eb690008605005

@mszadkow
Copy link
Contributor Author

mszadkow commented Aug 7, 2024

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Aug 7, 2024
pkg/controller/jobframework/interface.go Outdated Show resolved Hide resolved
pkg/controller/jobs/kubeflow/common/common.go Outdated Show resolved Hide resolved
pkg/controller/jobframework/interface.go Outdated Show resolved Hide resolved
pkg/controller/jobs/kubeflow/common/common.go Outdated Show resolved Hide resolved
Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't seem to be my expectations.
Could we commonize kubeflow adapters like kubeflow job reconcilers?

https://github.com/kubernetes-sigs/kueue/tree/main/pkg/controller/jobs/kubeflow/kubeflowjob

pkg/controller/jobframework/interface.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 13, 2024
@mszadkow mszadkow marked this pull request as ready for review August 14, 2024 12:38
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 14, 2024
@mszadkow
Copy link
Contributor Author

@alculquicondor you may want to have a look here too ;)

Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good overall :)

Comment on lines +113 to +121
// add the prebuilt workload
labels := remoteJob.GetLabels()
if remoteJob.GetLabels() == nil {
labels = make(map[string]string, 2)
}
labels[constants.PrebuiltWorkloadLabel] = workloadName
labels[kueuealpha.MultiKueueOriginLabel] = origin
remoteJob.SetLabels(labels)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code should even be part of the MK workload controller, but that can be left for a follow up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, noted

@alculquicondor
Copy link
Contributor

/release-note-edit

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Aug 16, 2024
Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good to me.
This is a comment for a followup.

Could you work on consolidating / commonizing the kubeflow MK integration tests (TFJob/PaddleJob/XGBoostJob/PyTorchJob)

ginkgo.It("Should run a TFJob on worker if admitted", func() {
?

I'm wondering if we can reduce the kubeflow MK integration tests like kubeflowjob reconcilers.

@mszadkow
Copy link
Contributor Author

Overall, looks good to me. This is a comment for a followup.

Could you work on consolidating / commonizing the kubeflow MK integration tests (TFJob/PaddleJob/XGBoostJob/PyTorchJob)

ginkgo.It("Should run a TFJob on worker if admitted", func() {

?
I'm wondering if we can reduce the kubeflow MK integration tests like kubeflowjob reconcilers.

I actually work on this one while doing MPIJob multikueue adapter, but I can add it here.

@tenzen-y
Copy link
Member

Overall, looks good to me. This is a comment for a followup.
Could you work on consolidating / commonizing the kubeflow MK integration tests (TFJob/PaddleJob/XGBoostJob/PyTorchJob)

ginkgo.It("Should run a TFJob on worker if admitted", func() {

?
I'm wondering if we can reduce the kubeflow MK integration tests like kubeflowjob reconcilers.

I actually work on this one while doing MPIJob multikueue adapter, but I can add it here.

It seems that this PR will be finalized soon. So, it may be helpful to work on test refactoring at another PR so that we do not block merging this PR.

@mszadkow mszadkow force-pushed the feature/support-kubeflow-multikueue-common-interface branch from 5f98f05 to 5ed0f3b Compare August 19, 2024 15:52
Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 19, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 282c50d2fe8cd61ed799f6f3196028fd327700e9

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, mszadkow

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 19, 2024
@k8s-ci-robot k8s-ci-robot merged commit 801734b into kubernetes-sigs:main Aug 19, 2024
16 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.9 milestone Aug 19, 2024
@mbobrovskyi mbobrovskyi deleted the feature/support-kubeflow-multikueue-common-interface branch August 19, 2024 17:06
mbobrovskyi pushed a commit to epam/kubernetes-kueue that referenced this pull request Aug 20, 2024
…etes-sigs#2795)

* attempt to commonize Kubeflow jobs Multikueue support methods

* make generic adapter to be specialized by job type

* Move common Multikueue adapter to Kubeflowjob package

* Changes after code review
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants