Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-task jobs #411

Open
andrii-i opened this issue Jul 26, 2023 · 8 comments
Open

Multi-task jobs #411

andrii-i opened this issue Jul 26, 2023 · 8 comments
Labels
enhancement New feature or request
Milestone

Comments

@andrii-i
Copy link
Collaborator

Problem

No support for multi-task jobs

Proposed Solution

Provide ability to run multiple tasks as one job, expressed as a DAG. This task depends on Dask/Ray backend implementation (#410).

@Zsailer
Copy link
Member

Zsailer commented Dec 4, 2023

Since the other issue was closed as a duplicate, I'm porting some of the discussion over here. cc @akshaychitneni

There is some helpful REST API design—borrowing from Elyra's similar functionality—that I don't think we should lose here.

Akshay has been thinking a lot about this topic recently. Let's use this thread to collaborate on this feature.


Problem

Jupyter scheduler currently enable users to create and manage background jobs that execute a notebook file. We would like to extend current jobs to support multiple notebook tasks where each task would execute a notebook file and also allow creating dependencies between the tasks. We want to intiate the discussion to extend jupyter scheduler so users can create and manage notebooks workflows and associated runs in jupyter workspace. It would also require UX for users to easily create tasks and it associated dependencies using a DAG editor.

Proposed Solution

Tentative model:

  • DescribeJobDefinition API Response
{
    "name": "test1",
    "tags": null,
    "output_filename_template": "{{input_filename}}-{{create_time}}",
    "schedule": "0 0 * * MON-FRI",
    "timezone": "America/Los_Angeles",
    "job_definition_id": "b5c6099c-9bec-4a04-968f-37e9c23c0f9b",
    "create_time": 1701124758318,
    "update_time": 1701124758317,
    "active": true,
    "tasks": [
     {  
        "name": "task1",
        "input_filename": "Untitled1.ipynb",
        "parameters": null,
        "runtimeProperties": {},
        "runtime_environment_name": "anaconda3",
        "runtime_environment_parameters": null,
        "output_formats": [
            "ipynb",
            "html"
        ],
        "compute_type": null,
        "trigger_rule": null,
        "dependsOn": []
     },
     {  "name": "task2",
        "input_filename": "Untitled2.ipynb",
        "parameters": null,
        "runtimeProperties": {},
        "runtime_environment_name": "anaconda3",
        "runtime_environment_parameters": null,
        "output_formats": [
            "ipynb",
            "html"
        ],
        "compute_type": null,
        "trigger_rule": "all_success",
        "dependsOn": ["task1"]
     }
}

DescribeJob API Response:

{
    
    "name": "job1",
    "tags": null,
    "output_filename_template": "{{input_filename}}-{{create_time}}",
    "job_id": "27d8a6ae-47d0-4ed3-9e28-5411d21a0e03",
    "url": "/jobs/27d8a6ae-47d0-4ed3-9e28-5411d21a0e03",
    "create_time": 1696264966089,
    "update_time": 1696264968551,
    "start_time": 1696264967241,
    "end_time": 1696264968550,
    "status": "COMPLETED",
    "status_message": null,
     "tasks": [
        {
            "input_filename": "Untitled2.ipynb",
            "runtime_environment_name": "anaconda3",
            "runtime_environment_parameters": null,
            "output_formats": [
                "ipynb",
                "html"
            ],
            "parameters": null,
            "name": "task1",
            "job_files": [
                {
                    "display_name": "HTML",
                    "file_format": "html",
                    "file_path": null
                },
                {
                    "display_name": "Input",
                    "file_format": "input",
                    "file_path": null
                }
            ],
            "create_time": 1696264966089,
            "update_time": 1696264968551,
            "start_time": 1696264967241,
            "end_time": 1696264968550,
            "trigger_rule": null,
            "dependsOn": [],
            "status": "COMPLETED",
            "status_message": null,
            "downloaded": false
        },
        {
            "input_filename": "Untitled2.ipynb",
            "runtime_environment_name": "anaconda3",
            "runtime_environment_parameters": null,
            "output_formats": [
                "ipynb",
                "html"
            ],
            "parameters": null,
            "name": "task2",
            "job_files": [
                {
                    "display_name": "HTML",
                    "file_format": "html",
                    "file_path": null
                },
                {
                    "display_name": "Input",
                    "file_format": "input",
                    "file_path": null
                }
            ],
            "create_time": 1696264966089,
            "update_time": 1696264968551,
            "start_time": 1696264967241,
            "end_time": 1696264968550,
            "trigger_rule": "all_success",
            "dependsOn": ["task1"],
            "status": "COMPLETED",
            "status_message": null,
            "downloaded": false
        }
    ] 
}

Providing such an interface would allow users to extend scheduler to integrate with external orchestrators or schedulers like airflow to schedule and run notebook DAGs.

Additional context

@akshaychitneni
Copy link

@3coins @JasonWeill @andrii-i Would you be able to attend the jupyter server meeting this week or next to start discussion on this work? I work with @Zsailer on the same team and would like to start collaborating with you all on this feature. Thanks

@3coins
Copy link
Collaborator

3coins commented Dec 4, 2023

@akshaychitneni
I will join this week, let’s discuss more. Before we start this task, it might be useful to move to a Dask based backend for the Scheduler, which will make running workflows much simpler.

@Zsailer
Copy link
Member

Zsailer commented Dec 5, 2023

Woohoo 🎉 I'm looking forwarding to hanging out with all of you cool people in the Jupyter Server meeting.

@andrii-i
Copy link
Collaborator Author

andrii-i commented Dec 6, 2023

I will join the meeting as well.

@Zsailer
Copy link
Member

Zsailer commented May 13, 2024

Hi all, hinted at by #517, our team has been developing UX around scheduling a DAG of notebooks using Jupyter-scheduler as the starting place. We've already discussed this with your team, but wanted to share a "sneak peak" preview of the feature in the open-source and begin collaborating openly here to get this work upstreamed (assuming folks would benefit).

Here is a video demonstrating the UX we built:

Workflows.mp4

In short, this essentially a re-write of the frontend and would replace the current scheduler UI with a broader use-case editor for scheduling a DAG of notebooks. @sathishlxg led this work and can work with y'all here to make the transition smooth.

We've also made significant changes to the REST API models to handle individual tasks. @akshaychitneni and @nsingl00 let this work, so they will work with you here to make the appropriate changes.

We recognize that this is a pretty disruptive change to the package. We're willing+able to help with the merging, releasing, and long term maintenance of this work.

We discussed hosting a regular meeting to get things open-sourced as soon as possible. We can use this thread (at least to start) to discuss next steps.

Thanks all!

@andrii-i
Copy link
Collaborator Author

andrii-i commented May 14, 2024

@Zsailer, @sathishlxg, @akshaychitneni, @nsingl00 thank you for open-sourcing your work. I'm excited to work on this with you. Other than scheduling a meeting, a good next step would be to open a PR or a branch with code, even if it would not work as-is. I would be happy to then work to make the necessary changes and integrate the new functionality.

@ellisonbg
Copy link
Contributor

Awesome work, look forward to learning more!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants