Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle pipeline migration concerns for BYO catalog type changes #2262

Closed
1 of 2 tasks
kiersten-stokes opened this issue Oct 29, 2021 · 4 comments · Fixed by #2321
Closed
1 of 2 tasks

Handle pipeline migration concerns for BYO catalog type changes #2262

kiersten-stokes opened this issue Oct 29, 2021 · 4 comments · Fixed by #2321
Assignees
Milestone

Comments

@kiersten-stokes
Copy link
Member

kiersten-stokes commented Oct 29, 2021

Is your feature request related to a problem? Please describe.
We will have another 1:1 node op change for our preloaded components that will require some frontend pipeline migration and migration of our pytest resources.

Describe the solution you'd like
The following 1:1 mapping should work for the built-in catalog components:

"run_notebook_using_papermill_Runnotebookusingpapermill": "local-directory-catalog:61e6f4141f65",
"filter_text_using_shell_and_grep_Filtertext": "local-directory-catalog:737915b826e9",
"component_Downloaddata": "url-catalog:c6c0588048ae",
"component_Calculatedatahash": "url-catalog:4fc759382b1b",
"bash_operator_BashOperator": "url-catalog:49f8e61b78c3",
"email_operator_EmailOperator": "url-catalog:8bef428ea3cd",
"http_operator_SimpleHttpOperator": "url-catalog:e97030fb448a",
"spark_sql_operator_SparkSqlOperator": "url-catalog:ff0d51b70719",
"spark_submit_operator_SparkSubmitOperator": "url-catalog:2756314f3ff5",
"slack_operator_SlackAPIPostOperator": "local-file-catalog:81b4f925702e"

We should also have the following mapping for the component_source property for the properties below (old -> new):

  • "component_Calculatedatahash/"url-catalog:4fc759382b1b":
'https://raw.githubusercontent.com/kubeflow/pipelines/1.6.0/components/basics/Calculate_hash/component.yaml'
->
{'catalog_type': 'url-catalog', 'component_ref': {'url': 'https://raw.githubusercontent.com/kubeflow/pipelines/1.6.0/components/basics/Calculate_hash/component.yaml'}}
  • "component_Downloaddata"/"url-catalog:c6c0588048ae"
'https://raw.githubusercontent.com/kubeflow/pipelines/1.6.0/components/web/Download/component.yaml'
->
{'catalog_type': 'url-catalog', 'component_ref': {'url': 'https://raw.githubusercontent.com/kubeflow/pipelines/1.6.0/components/web/Download/component.yaml'}}
  • "bash_operator_BashOperator"/"url-catalog:49f8e61b78c3":
'https://raw.githubusercontent.com/apache/airflow/1.10.15/airflow/operators/bash_operator.py'
->
{'catalog_type': 'url-catalog', 'component_ref': {'url': 'https://raw.githubusercontent.com/apache/airflow/1.10.15/airflow/operators/bash_operator.py'}}
  • "email_operator_EmailOperator"/"url-catalog:8bef428ea3cd":
'https://raw.githubusercontent.com/apache/airflow/1.10.15/airflow/operators/email_operator.py'
->
{'catalog_type': 'url-catalog', 'component_ref': {'url': 'https://raw.githubusercontent.com/apache/airflow/1.10.15/airflow/operators/email_operator.py'}}
  • "http_operator_SimpleHttpOperator"/"url-catalog:e97030fb448a":
'https://raw.githubusercontent.com/apache/airflow/1.10.15/airflow/operators/http_operator.py'
->
{'catalog_type': 'url-catalog', 'component_ref': {'url': 'https://raw.githubusercontent.com/apache/airflow/1.10.15/airflow/operators/http_operator.py'}}
  • "spark_sql_operator_SparkSqlOperator"/"url-catalog:ff0d51b70719":
'https://raw.githubusercontent.com/apache/airflow/1.10.15/airflow/contrib/operators/spark_sql_operator.py'
->
{'catalog_type': 'url-catalog', 'component_ref': {'url': 'https://raw.githubusercontent.com/apache/airflow/1.10.15/airflow/contrib/operators/spark_sql_operator.py'}}
  • "spark_submit_operator_SparkSubmitOperator"/"url-catalog:2756314f3ff5":
'https://raw.githubusercontent.com/apache/airflow/1.10.15/airflow/contrib/operators/spark_submit_operator.py'
->
{'catalog_type': 'url-catalog', 'component_ref': {'url': 'https://raw.githubusercontent.com/apache/airflow/1.10.15/airflow/contrib/operators/spark_submit_operator.py'}}

The remaining 3 components are a bit stickier in that the old and new values include absolute locations on a user's machine. The general format for these components will be as follows:

  • "run_notebook_using_papermill_Runnotebookusingpapermill"/"local-directory-catalog:61e6f4141f65":
'[ENV_JUPYTER_PATH[0]]/components/kfp/run_notebook_using_papermill.yaml'
->
{'catalog_type': 'local-directory-catalog', 'component_ref': {'base_dir': '[ENV_JUPYTER_PATH[0]]/components/kfp', 'path': 'run_notebook_using_papermill.yaml'}}
  • "filter_text_using_shell_and_grep_Filtertext"/"local-directory-catalog:737915b826e9":
'[ENV_JUPYTER_PATH[0]]/components/kfp/filter_text_using_shell_and_grep.yaml'
->
{'catalog_type': 'local-directory-catalog', 'component_ref': {'base_dir': '[ENV_JUPYTER_PATH[0]]/components/kfp', 'path': 'filter_text_using_shell_and_grep.yaml'}}
  • "slack_operator_SlackAPIPostOperator"/"local-file-catalog:81b4f925702e":
'[ENV_JUPYTER_PATH[0]]/components/airflow/slack_operator.py'
->
{'catalog_type': 'local-file-catalog', 'component_ref': {'base_dir': '[ENV_JUPYTER_PATH[0]]/components', 'path': 'airflow/slack_operator.py''}}

Additional context
Since this is a relatively easy change, I would probably hold off on implementing it until we're sure that we've finalized these ops.

@kiersten-stokes
Copy link
Member Author

#2272 will affect migration for the ops/ids and the component_source property values. I'll update with the latest as soon as that's merged or at least finalized

@kevin-bates
Copy link
Member

kevin-bates commented Nov 4, 2021

The Runtime Types PR (#2263) essentially takes the old format of the pipeline file's app_data:

      "app_data": {
        "ui_data": {
          "comments": []
        },
        "version": 5,
        "runtime": "kfp",
        "properties": {
          "name": "kfp_custom",
          "runtime": "kfp",
          "description": "3-node custom component pipeline"
        }
      },

and converts to this format:

      "app_data": {
        "ui_data": {
          "comments": []
        },
        "version": 6,
        "runtime": "kfp",
        "runtime_type": "KUBEFLOW_PIPELINES",
        "properties": {
          "name": "kfp_custom",
          "description": "3-node custom component pipeline"
        }
      },

To summarize:

  1. The app_data.properties.runtime entry is renamed to runtime_type and moved out of properties to be a sibling to app_data.runtime. Note that app_data.runtime may not always be present.
  2. The value of the (new) app_data.runtime_type field (from the old app_data.properties.runtime field) should be mapped as follows:
old value new value
kfp KUBEEFLOW_PIPELINES
airflow APACHE_AIRFLOW
non-existent non-existent (*)
generic non-existent (*)

(*): a non-existent or empty runtime_type value implies a Generic pipeline. This "hint" is only used by the UI.

Test assets (pipeline files) have been updated for the integration tests (but left at version 5). Server test resources (pipeline files) have only been updated as needed (and also left at version 5).

@kiersten-stokes
Copy link
Member Author

kiersten-stokes commented Nov 5, 2021

Here's a table summarizing the before/after. The component_source field value is still technically a string as of right now (until #2256). Note that I included the intermediate op that's listed in the issue just in case work on this had already been done and a find-and-replace would make sense

This table is up to date with the latest from examples PR #79 and #2286 as of 11.10

v3.2.x op op before finalized final op final component_source
"run_notebook_using_papermill_Runnotebookusingpapermill" "local-directory-catalog:61e6f4141f65" "elyra-kfp-examples-catalog:61e6f4141f65" {'catalog_type': 'elyra-kfp-examples-catalog', 'component_ref': {'component-id': 'run_notebook_using_papermill.yaml'}}
"filter_text_using_shell_and_grep_Filtertext" "local-directory-catalog:737915b826e9" "elyra-kfp-examples-catalog:737915b826e9" {'catalog_type': 'elyra-kfp-examples-catalog', 'component_ref': {'component-id': 'filter_text_using_shell_and_grep.yaml'}}
"component_Downloaddata" "url-catalog:c6c0588048ae" "elyra-kfp-examples-catalog:a08014f9252f" {'catalog_type': 'elyra-kfp-examples-catalog', 'component_ref': {'component-id': 'download_data.yaml'}}
"component_Calculatedatahash" "url-catalog:4fc759382b1b" "elyra-kfp-examples-catalog:d68ec7fcdf46" {'catalog_type': 'elyra-kfp-examples-catalog', 'component_ref': {'component-id': 'calculate_hash.yaml'}}
"bash_operator_BashOperator" "url-catalog:49f8e61b78c3" "elyra-airflow-examples-catalog:3a55d015ea96" {'catalog_type': 'elyra-airflow-examples-catalog', 'component_ref': {'component-id': 'bash_operator.py'}}
"email_operator_EmailOperator" "url-catalog:8bef428ea3cd" "elyra-airflow-examples-catalog:a043648d3897" {'catalog_type': 'elyra-airflow-examples-catalog', 'component_ref': {'component-id': 'email_operator.py'}}
"http_operator_SimpleHttpOperator" "url-catalog:e97030fb448a" "elyra-airflow-examples-catalog:b94cd49692e2" {'catalog_type': 'elyra-airflow-examples-catalog', 'component_ref': {'component-id': 'http_operator.py'}}
"spark_sql_operator_SparkSqlOperator" "url-catalog:ff0d51b70719" "elyra-airflow-examples-catalog:3b639742748f" {'catalog_type': 'elyra-airflow-examples-catalog', 'component_ref': {'component-id': 'spark_sql_operator.py'}}
"spark_submit_operator_SparkSubmitOperator" "url-catalog:2756314f3ff5" "elyra-airflow-examples-catalog:b29c25ec8bd6" {'catalog_type': 'elyra-airflow-examples-catalog', 'component_ref': {'component-id': 'spark_submit_operator.py'}}
"slack_operator_SlackAPIPostOperator" "local-file-catalog:81b4f925702e" "elyra-airflow-examples-catalog:16a204f716a2" {'catalog_type': 'elyra-airflow-examples-catalog', 'component_ref': {'component-id': 'slack_operator.py'}}

@kevin-bates
Copy link
Member

@ajbozarth - please see my update to the table in #2262 (comment). With the changes in #2287, it's important that "generic" not appear as a runtime_type value.

ajbozarth added a commit to elyra-ai/pipeline-editor that referenced this issue Nov 30, 2021
Addresses elyra-ai/elyra#2262

Increments pipeline version to 6 and add migration code

Tests to be added in followup PR
ajbozarth added a commit that referenced this issue Nov 30, 2021
Fixes #2262

Sister PR to elyra-ai/pipeline-editor#172

Update the pipeline version and migration code to handle the changes
in Elyra 3.3

Pin package versions to prevent future backwards compatibility errors

Co-authored-by: Alan Chin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants