Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guided Demo ClusterConfig Image is not optional #612

Open
strangiato opened this issue Jul 26, 2024 · 1 comment
Open

Guided Demo ClusterConfig Image is not optional #612

strangiato opened this issue Jul 26, 2024 · 1 comment

Comments

@strangiato
Copy link

Describe the Bug

In the Basic Ray Demo:

https://github.com/project-codeflare/codeflare-sdk/blob/main/demo-notebooks/guided-demos/0_basic_ray.ipynb

The ClusterConfigation includes a commented out parameter for image with a note saying it is optional. With the parameter commented out it produces the following error:

ValueError: Image must be specified in the ClusterConfiguration

If you revert the file to an older version it has the following:

# Create and configure our cluster object
# The SDK will try to find the name of your default local queue based on the annotation "kueue.x-k8s.io/default-queue": "true" unless you specify the local queue manually below
cluster = Cluster(ClusterConfiguration(
    name='raytest', 
    head_cpus='500m',
    head_memory=2,
    head_gpus=0, # For GPU enabled workloads set the head_gpus and num_gpus
    num_gpus=0,
    num_workers=2,
    min_cpus='250m',
    max_cpus=1,
    min_memory=4,
    max_memory=4,
    image="quay.io/rhoai/ray:2.23.0-py39-cu121",
    write_to_file=False, # When enabled Ray Cluster yaml files are written to /HOME/.codeflare/resources 
    # local_queue="local-queue-name" # Specify the local queue manually
))

Which works correctly without an error.

Codeflare Stack Component Versions

Please specify the component versions in which you have encountered this bug.

Codeflare SDK: 0.16.4
MCAD:
Instascale:
Codeflare Operator:
Other:

Openshift AI 2.11

Steps to Reproduce the Bug

  1. Clone repo
  2. Update token/api url
  3. Attempt to execute code

What Have You Already Tried to Debug the Issue?

Adding the image resolves the issue.

Expected Behavior

It does not appear that this field is optional or if it is optional in newer versions, that should be noted in the example code.

Screenshots, Console Output, Logs, etc.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[6], line 3
      1 # Create and configure our cluster object
      2 # The SDK will try to find the name of your default local queue based on the annotation "kueue.x-k8s.io/default-queue": "true" unless you specify the local queue manually below
----> 3 cluster = Cluster(ClusterConfiguration(
      4     name='raytest', 
      5     head_cpus='500m',
      6     head_memory=2,
      7     head_gpus=0, # For GPU enabled workloads set the head_gpus and num_gpus
      8     num_gpus=0,
      9     num_workers=2,
     10     min_cpus='250m',
     11     max_cpus=1,
     12     min_memory=4,
     13     max_memory=4,
     14     # image="quay.io/rhoai/ray:2.23.0-py39-cu121",
     15     write_to_file=False, # When enabled Ray Cluster yaml files are written to /HOME/.codeflare/resources 
     16     # local_queue="local-queue-name" # Specify the local queue manually
     17 ))

File /opt/app-root/lib64/python3.9/site-packages/codeflare_sdk/cluster/cluster.py:70, in Cluster.__init__(self, config)
     63 """
     64 Create the resource cluster object by passing in a ClusterConfiguration
     65 (defined in the config sub-module). An AppWrapper will then be generated
     66 based off of the configured resources to represent the desired cluster
     67 request.
     68 """
     69 self.config = config
---> 70 self.app_wrapper_yaml = self.create_app_wrapper()
     71 self._job_submission_client = None
     72 self.app_wrapper_name = self.config.name

File /opt/app-root/lib64/python3.9/site-packages/codeflare_sdk/cluster/cluster.py:132, in Cluster.create_app_wrapper(self)
    127         raise TypeError(
    128             f"Namespace {self.config.namespace} is of type {type(self.config.namespace)}. Check your Kubernetes Authentication."
    129         )
    131 # Validate image configuration
--> 132 self.validate_image_config()
    134 # Before attempting to create the cluster AW, let's evaluate the ClusterConfig
    136 name = self.config.name

File /opt/app-root/lib64/python3.9/site-packages/codeflare_sdk/cluster/cluster.py:114, in Cluster.validate_image_config(self)
    107 """
    108 Validates that the image configuration is not empty.
    109 
    110 :param image: The image string to validate
    111 :raises ValueError: If the image is not specified
    112 """
    113 if self.config.image == "" or self.config.image == None:
--> 114     raise ValueError("Image must be specified in the ClusterConfiguration")

ValueError: Image must be specified in the ClusterConfiguration

Affected Releases

Issue appears to have been introduced in this commit:

5262e26

Additional Context

Add as applicable and when known:

  • OS: 1) MacOS, 2) Linux, 3) Windows: [1 - 3]
  • OS Version: [e.g. RedHat Linux X.Y.Z, MacOS Monterey, ...]
  • Browser (UI issues): 1) Chrome, 2) Safari, 3) Firefox, 4) Other (describe): [1 - 4 + description?]
  • Browser Version (UI issues): [e.g. Firefix 97.0]
  • Cloud: 1) AWS, 2) IBM Cloud, 3) Other (describe), or 4) on-premise: [1 - 4 + description?]
  • Kubernetes: 1) OpenShift, 2) Other K8s [1 - 2 + description]
  • OpenShift or K8s version: [e.g. 1.23.1]
  • Other relevant info

Add any other information you think might be useful here.

@Bobbins228
Copy link
Contributor

Hey @strangiato this should have been fixed as recently as SDK ver 0.17.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants