Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add arguments to pass Ray cluster head and worker templates #570

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

astefanutti
Copy link
Contributor

Issue link

Fixes #413.

What changes have been made

This PR adds parameters to the Ray cluster configuration API, so users can provide Pod template for the head and worker nodes, e.g.:

from kubernetes import V1PodTemplateSpec, V1PodSpec, V1Toleration

cluster = Cluster(ClusterConfiguration(
    worker_template=V1PodTemplateSpec(
        spec=V1PodSpec(
            tolerations=[V1Toleration(
                key="nvidia.com/gpu",
                operator="Exists",
                effect="NoSchedule",
            )],
            node_selector={
                "nvidia.com/gpu.present": "true",
            },
        )
    ),
    head_template=V1PodTemplateSpec(...),
))

Verification steps

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • Testing is not required for this change

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 21, 2024
Copy link
Contributor

openshift-ci bot commented Jun 21, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from astefanutti. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Collaborator

@KPostOffice KPostOffice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one comment, which pertains to the scope of what the user can provide. I think it might require more in depth discussion.


def apply_worker_template(cluster_yaml: dict, worker_template: client.V1PodTemplateSpec):
worker = cluster_yaml.get("spec").get("workerGroupSpecs")[0]
merge(worker["template"], worker_template.to_dict(), strategy=Strategy.ADDITIVE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should a user edit the container spec? Specifying partial container spec will only add a new partial entry to the containers list. This becomes an issue when specifying something like extra volume mounts.

Copy link
Contributor Author

@astefanutti astefanutti Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, ideally we'd want to leverage strategic merge patch for that: https://kubernetes.io/docs/tasks/manage-kubernetes-objects/update-api-object-kubectl-patch/#notes-on-the-strategic-merge-patch. It's easy to do in Go by using https://github.com/kubernetes/apimachinery/blob/master/pkg/util/strategicpatch/patch.go, but it don't know if that's possible in Python.

With mergedeep and the ADDITIVE strategy, lists are not replaced, but elements appended. The problem being that something like:

cluster = Cluster(ClusterConfiguration(
        head_template=V1PodTemplateSpec(
          spec=V1PodSpec(
              containers=[
                  V1Container(name="ray-head", volume_mounts=[
                      V1VolumeMount(name="config", mount_path="path"),
                  ])
              ],
              volumes=[
                  V1Volume(name="config", config_map=V1ConfigMapVolumeSource(name="config")),
              ],
          )
        ),

It appends an entire new container, while it should only append the extra volume mount to the existing container.

Strategic merge patch solves this, by relying on information like patchMergeKey in Go structs (x-kubernetes-patch-merge-key in CRDs) so it knows how to match items by key.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't seem to find anything for merging V1PodTemplate specs, only namespaced pods by sending API requests. There's this package https://pypi.org/project/jsonmerge/ which we could use, it would require maintaining a bit of redundant config however 😒.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've actually stumbled upon jsonmerge this morning as well. it seems it'd be possible to provide a merge strategy by key for lists. Let's give it a try? I agree we'd have to maintain some config, unless we figure a way to leverage Kubernetes JSON schema https://github.com/yannh/kubernetes-json-schema?tab=readme-ov-file, that do contain x-kubernetes-patch-merge-key information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@varshaprasad96 maybe you would have some ideas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One step better than just additive merging could be calculating the diff and then merging them (deepdiff and then deepmerge - with probably custom merging strategies if needed). But this would still not solve the problem with conflicts at the very least. Looks like if we want to leverage JSON schema we either need to use a live client or load the config to guide merging process.

Copy link
Collaborator

@KPostOffice KPostOffice Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a script that we could use as a starting point for generate a jsonmerge schema from a k8 schema. I haven't gotten around to testing it yet. The output looks right based on my understanding of the jsonmerge package documentation, not based on testing done with actual objects. Sorry for the messiness

def merge_schema_from_k8_schema(k8_schema):
    to_return = {}
    k8_properties = {}
    if "array" in k8_schema.get("type", []) and k8_schema.get("x-kubernetes-list-type") != "atomic":
        to_return["items"] = {}
        to_return["items"]["properties"] = {}
        toret_properties = to_return["items"]["properties"]
        k8_properties = k8_schema.get("items", {}).get("properties", {})
    elif "object" in k8_schema.get("type", []):
        to_return["properties"] = {}
        toret_properties = to_return["properties"]
        k8_properties = k8_schema.get("properties", {})
    for key, value in k8_properties.items():
        if "object" in value.get("type", []) and "properties" in value:
            toret_properties[key] = merge_schema_from_k8_schema(value)
        elif "array" in value.get("type", []) and "items" in value:
            toret_properties[key] = merge_schema_from_k8_schema(value)
        else:
            toret_properties[key] = {"mergeStrategy": "overwrite"}

    if "array" in k8_schema.get("type", []):
        if k8_schema.get("x-kubernetes-list-type") == "set":
            to_return["mergeStrategy"] = "arrayMergeById"
            to_return["idRef"] = "/"
        elif k8_schema.get("x-kubernetes-list-type") == "atomic":
            to_return["mergeStrategy"] = "overwrite"
        elif k8_schema.get("x-kubernetes-list-type") == "map":
            if k8_schema.get("x-kubernetes-patch-merge-key"):
                to_return["mergeStrategy"] = "arrayMergeById"
                to_return["id"] = k8_schema["x-kubernetes-patch-merge-key"]
            else:
                to_return["mergeStrategy"] = "overwrite"
    elif "object" in k8_schema.get("type", []):
        to_return["mergeStrategy"] = "objectMerge"
    else:
        to_return["mergeStrategy"] = "overwrite"
    return to_return

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 30, 2024
@openshift-merge-robot
Copy link
Collaborator

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ClusterConfiguration should support tolerations
4 participants