[Feature] Integration test case with Data Science Pipeline, CodeFlare and KubeRay #425

yuanchi2807 · 2023-12-06T14:10:01Z

Name of Feature or Improvement

Create an integration test case to validate DSP, CodeFlare and KubeRay implementation.

Describe the Solution You Would Like to See

Test environment assumptions:

Data Science Pipeline v1.
Ray cluster shall consist of no more than 2 worker pods, with 2 CPU cores and less than 6 GB available for each pod.
An integration test execution time shall be less than 20 mins in total.
S3 storage may be available, if needed.
Free of proprietary intellectual property.
Public data only.

Proposed test case: Clustering text documents using k-means on scikit-learn education page.

https://scikit-learn.org/stable/auto_examples/text/plot_document_clustering.html

Data Science Pipeline stages:

Downloading test data (https://scikit-learn.org/stable/auto_examples/text/plot_document_clustering.html#loading-text-data)
Launch Ray cluster with two worker pods.
Ray driver launches two Ray actors, deployed to a pod each. The first actor runs TfidfVectorizer, followed by Kmeans clustering and evaluation. The second actor runs HashingVectorizer, followed by Kmeans clustering and evaluation.
Ray driver collects evaluation results from the two actors. Then it reports the summaries.
Ray cluster is stopped and shutdown.
Pipeline run is completed.

Expected test assets:

DSP pipeline yaml to deploy and kick off test runs.
Test image with Ray and document clustering code.
CodeFlare image to deploy the test image.
Preconfigured credentials and configmaps in the test environment.

yuanchi2807 · 2023-12-11T20:06:09Z

Cross posting from https://github.com/opendatahub-io/data-science-pipelines/issues/179

A prototype following the above solution design can be found at this link.

https://github.com/yuanchi2807/dsp_codeflare_int_testing

Ray application image can be pulled from quay.io/yuanchichang_ibm/integration_testing/dsp_codeflare_int_testing:0.1

The pipeline definition yet_another_ray_integration_test.py is modified from https://github.com/diegolovison/ods-ci/blob/ray_integration/ods_ci/tests/Resources/Files/pipeline-samples/ray_integration.py to point to the custom image and invokes docker_clustering_driver.py through Ray jobs API.

Please feel free to comment.

anishasthana · 2023-12-13T03:39:21Z

fyi @sutaakar

sutaakar · 2023-12-13T09:21:01Z

On the first look it looks fine to me. I will try to run it this week.
Waiting for feedback from Diego, as he has more experience with Pipelines.

yuanchi2807 · 2023-12-13T15:26:00Z

On the first look it looks fine to me. I will try to run it this week. Waiting for feedback from Diego, as he has more experience with Pipelines.

My prototype is to test the water and can be enhanced to lengthen the pipeline.

yuanchi2807 mentioned this issue Dec 6, 2023

[Feature Request]: integration test cases with DSP, CodeFlare and KubeRay in a multi-step pipeline opendatahub-io/data-science-pipelines-tekton#179

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Integration test case with Data Science Pipeline, CodeFlare and KubeRay #425

[Feature] Integration test case with Data Science Pipeline, CodeFlare and KubeRay #425

yuanchi2807 commented Dec 6, 2023

yuanchi2807 commented Dec 11, 2023

anishasthana commented Dec 13, 2023

sutaakar commented Dec 13, 2023 •

edited

Loading

yuanchi2807 commented Dec 13, 2023

[Feature] Integration test case with Data Science Pipeline, CodeFlare and KubeRay #425

[Feature] Integration test case with Data Science Pipeline, CodeFlare and KubeRay #425

Comments

yuanchi2807 commented Dec 6, 2023

Name of Feature or Improvement

Describe the Solution You Would Like to See

Test environment assumptions:

Proposed test case: Clustering text documents using k-means on scikit-learn education page.

Data Science Pipeline stages:

Expected test assets:

yuanchi2807 commented Dec 11, 2023

anishasthana commented Dec 13, 2023

sutaakar commented Dec 13, 2023 • edited Loading

yuanchi2807 commented Dec 13, 2023

sutaakar commented Dec 13, 2023 •

edited

Loading