Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide Kubeflow PyTorchJob support for MultiKueue #2735

Merged

Conversation

mszadkow
Copy link
Contributor

@mszadkow mszadkow commented Jul 31, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

The PR introduces a new MultiKueue adapter to handle PyTorchJob (Kubeflow).
We want to extend MultiKueue capabilities to satisfy the needs of early adopters.

Which issue(s) this PR fixes:

Relates #2552

Special notes for your reviewer:

Does this PR introduce a user-facing change?

MultiKueue: Support for the Kubeflow PyTorchJob

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 31, 2024
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jul 31, 2024
Copy link

netlify bot commented Jul 31, 2024

Deploy Preview for kubernetes-sigs-kueue ready!

Name Link
🔨 Latest commit 68dc6da
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/66ac90ba45513100098eabe3
😎 Deploy Preview https://deploy-preview-2735--kubernetes-sigs-kueue.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@mszadkow
Copy link
Contributor Author

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Jul 31, 2024
@mszadkow mszadkow force-pushed the feature/support-kubeflow-pytorch-in-mk branch from 7384585 to 48c87ee Compare August 1, 2024 12:02
@mszadkow
Copy link
Contributor Author

mszadkow commented Aug 1, 2024

/retest

Copy link
Contributor

@trasc trasc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM

@@ -459,6 +460,88 @@ var _ = ginkgo.Describe("MultiKueue", func() {
}, util.Timeout, util.Interval).Should(gomega.Succeed())
})
})

ginkgo.It("Should run a kubeflow PyTorchJob on worker if admitted", func() {
pyTorchJob := testingpytorchjob.MakePyTorchJob("pytorchjob1", managerNs.Name).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment on the worker we expect o run it and why

(nit: maybe we should make this one run in worker 2)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense I will update in all of the jobs

@mszadkow mszadkow marked this pull request as ready for review August 1, 2024 15:01
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 1, 2024
Copy link
Contributor

@trasc trasc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 2, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 8b2a8a046c07026986cb8e09ed8681b1410ba595

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nits.

@mszadkow mszadkow force-pushed the feature/support-kubeflow-pytorch-in-mk branch from 6671628 to 68dc6da Compare August 2, 2024 07:54
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 2, 2024
@k8s-ci-robot k8s-ci-robot requested a review from trasc August 2, 2024 07:54
@mszadkow
Copy link
Contributor Author

mszadkow commented Aug 2, 2024

@tenzen-y @trasc updated, ptal

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 2, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 843db76576fd6082359d2e4e65f56ae3ceb85518

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mszadkow, tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 2, 2024
@k8s-ci-robot k8s-ci-robot merged commit bdc0749 into kubernetes-sigs:main Aug 2, 2024
16 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.9 milestone Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants