-
Notifications
You must be signed in to change notification settings - Fork 689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-2170: Adding CEL validations on v2 TrainJob CRD #2260
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this.
I left my first feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you implement integration testing if these validations should work?
We can implement those tests in https://github.com/kubeflow/training-operator/tree/126110fd4d76439bd04ca9fdf96bafb7ea3b6910/test/integration/webhook.v2.
Pull Request Test Coverage Report for Build 11003550343Details
💛 - Coveralls |
/hold |
Additionally, could you sign DCO? |
5c876bb
to
2b97162
Compare
Signed-off-by: Akshay Chitneni <[email protected]>
2b97162
to
fba853b
Compare
/ok-to-test |
/assign @saileshd1402 @varshaprasad96 |
@andreyvelich: GitHub didn't allow me to assign the following users: saileshd1402, varshaprasad96. Note that only kubeflow members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this @akshaychitneni!
I left my initial comments.
/assign @kubeflow/wg-training-leads
@@ -56,6 +56,7 @@ type TrainJobList struct { | |||
} | |||
|
|||
// TrainJobSpec represents specification of the desired TrainJob. | |||
// +kubebuilder:validation:XValidation:rule="!has(oldSelf.managedBy) || has(self.managedBy)", message="ManagedBy is required once set" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a type scoped rule, making sure that it is not removed once set. Not sure if this is necessary as a default is being set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we set it here ?
// +kubebuilder:validation:XValidation:rule="self == oldSelf", message="ManagedBy value is immutable" |
APIGroup *string `json:"apiGroup,omitempty"` | ||
|
||
// Kind of the runtime being referenced. | ||
// It must be one of TrainingRuntime or ClusterTrainingRuntime. | ||
// Defaults to ClusterTrainingRuntime. | ||
// +kubebuilder:default="ClusterTrainingRuntime" | ||
// +kubebuilder:validation:XValidation:rule="self in ['ClusterTrainingRuntime', 'TrainingRuntime']", message="Kind must be ClusterTrainingRuntime or TrainingRuntime if set" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -56,6 +56,7 @@ type TrainJobList struct { | |||
} | |||
|
|||
// TrainJobSpec represents specification of the desired TrainJob. | |||
// +kubebuilder:validation:XValidation:rule="!has(oldSelf.managedBy) || has(self.managedBy)", message="ManagedBy is required once set" | |||
type TrainJobSpec struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akshaychitneni Do we want to add validations/defaults for other pars of TrainJob (e.g. Trainer, DatasetConfig, ModelConfig) as part of this PR ?
@@ -0,0 +1,155 @@ | |||
package cel_v2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, we can add those integration tests as part of /test/integration/trainjob_controller_test.go
, similar to JobSet: https://github.com/kubernetes-sigs/jobset/blob/main/test/integration/controller/jobset_controller_test.go#L49
WDYT @akshaychitneni @tenzen-y ?
apiGroup := "kubeflow.org" | ||
kind := "TrainingRuntime" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's create a new module constants
under /pkg/constants/constants.go
, that we will use for common constants like: Kind, APIGroup, etc. We will use them in different places.
WDYT @tenzen-y @akshaychitneni ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to add these under respective APIs groupversion_info.go
instead, to make sure the import paths are cleaner while calling these constants for both the API versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I think, the GroupVersion is already set here: https://github.com/kubeflow/training-operator/blob/master/pkg/apis/kubeflow.org/v2alpha1/groupversion_info.go#L29, but not the Kind.
@varshaprasad96 @tenzen-y Where do you think we should put the Kind constants ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have Kind
defined in there too.
What this PR does / why we need it:
This PR relates to #2209 adding CEL validations on TrainJob CRD. I will followup with validations implemented in webhook in separate PR
cc @andreyvelich @tenzen-y