Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend Jupyter-scheduler to create and manage jobs with multiple tasks #469

Closed
akshaychitneni opened this issue Nov 28, 2023 · 2 comments
Closed
Labels
duplicate This issue or pull request already exists enhancement New feature or request

Comments

@akshaychitneni
Copy link

Problem

Jupyter scheduler currently enable users to create and manage background jobs that execute a notebook file. We would like to extend current jobs to support multiple notebook tasks where each task would execute a notebook file and also allow creating dependencies between the tasks. We want to intiate the discussion to extend jupyter scheduler so users can create and manage notebooks workflows and associated runs in jupyter workspace. It would also require UX for users to easily create tasks and it associated dependencies using a DAG editor.

Proposed Solution

Tentative model:

  • DescribeJobDefinition API Response
{
    "name": "test1",
    "tags": null,
    "output_filename_template": "{{input_filename}}-{{create_time}}",
    "schedule": "0 0 * * MON-FRI",
    "timezone": "America/Los_Angeles",
    "job_definition_id": "b5c6099c-9bec-4a04-968f-37e9c23c0f9b",
    "create_time": 1701124758318,
    "update_time": 1701124758317,
    "active": true,
    "tasks": [
     {  
        "name": "task1",
        "input_filename": "Untitled1.ipynb",
        "parameters": null,
        "runtimeProperties": {},
        "runtime_environment_name": "anaconda3",
        "runtime_environment_parameters": null,
        "output_formats": [
            "ipynb",
            "html"
        ],
        "compute_type": null,
        "trigger_rule": null,
        "dependsOn": []
     },
     {  "name": "task2",
        "input_filename": "Untitled2.ipynb",
        "parameters": null,
        "runtimeProperties": {},
        "runtime_environment_name": "anaconda3",
        "runtime_environment_parameters": null,
        "output_formats": [
            "ipynb",
            "html"
        ],
        "compute_type": null,
        "trigger_rule": "all_success",
        "dependsOn": ["task1"]
     }
}

DescribeJob API Response:

{
    
    "name": "job1",
    "tags": null,
    "output_filename_template": "{{input_filename}}-{{create_time}}",
    "job_id": "27d8a6ae-47d0-4ed3-9e28-5411d21a0e03",
    "url": "/jobs/27d8a6ae-47d0-4ed3-9e28-5411d21a0e03",
    "create_time": 1696264966089,
    "update_time": 1696264968551,
    "start_time": 1696264967241,
    "end_time": 1696264968550,
    "status": "COMPLETED",
    "status_message": null,
     "tasks": [
        {
            "input_filename": "Untitled2.ipynb",
            "runtime_environment_name": "anaconda3",
            "runtime_environment_parameters": null,
            "output_formats": [
                "ipynb",
                "html"
            ],
            "parameters": null,
            "name": "task1",
            "job_files": [
                {
                    "display_name": "HTML",
                    "file_format": "html",
                    "file_path": null
                },
                {
                    "display_name": "Input",
                    "file_format": "input",
                    "file_path": null
                }
            ],
            "create_time": 1696264966089,
            "update_time": 1696264968551,
            "start_time": 1696264967241,
            "end_time": 1696264968550,
            "trigger_rule": null,
            "dependsOn": [],
            "status": "COMPLETED",
            "status_message": null,
            "downloaded": false
        },
        {
            "input_filename": "Untitled2.ipynb",
            "runtime_environment_name": "anaconda3",
            "runtime_environment_parameters": null,
            "output_formats": [
                "ipynb",
                "html"
            ],
            "parameters": null,
            "name": "task2",
            "job_files": [
                {
                    "display_name": "HTML",
                    "file_format": "html",
                    "file_path": null
                },
                {
                    "display_name": "Input",
                    "file_format": "input",
                    "file_path": null
                }
            ],
            "create_time": 1696264966089,
            "update_time": 1696264968551,
            "start_time": 1696264967241,
            "end_time": 1696264968550,
            "trigger_rule": "all_success",
            "dependsOn": ["task1"],
            "status": "COMPLETED",
            "status_message": null,
            "downloaded": false
        }
    ] 
}

Providing such an interface would allow users to extend scheduler to integrate with external orchestrators or schedulers like airflow to schedule and run notebook DAGs.

Additional context

@akshaychitneni akshaychitneni added the enhancement New feature or request label Nov 28, 2023
Copy link

welcome bot commented Nov 28, 2023

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@JasonWeill JasonWeill added the duplicate This issue or pull request already exists label Nov 28, 2023
@JasonWeill
Copy link
Collaborator

@akshaychitneni Thank you so much for contributing to Jupyter Scheduler! We have an issue #411 to cover multi-task jobs, opened earlier. I'm going to close this one as a duplicate, but let's keep the conversation going on the earlier issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants