Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Look into Jupyter Scheduler for possible integration into Nebari #1755

Closed
Adam-D-Lewis opened this issue Apr 26, 2023 · 4 comments · Fixed by #1832
Closed

Look into Jupyter Scheduler for possible integration into Nebari #1755

Adam-D-Lewis opened this issue Apr 26, 2023 · 4 comments · Fixed by #1832
Assignees
Labels
area: JupyterLab needs: investigation 🔍 Someone in the team needs to find the root cause and replicate this bug type: enhancement 💅🏼 New feature or request

Comments

@Adam-D-Lewis
Copy link
Member

Context

https://blog.jupyter.org/introducing-jupyter-scheduler-f9e82676c388

Value and/or benefit

Allow users to schedule notebooks in a Jupyter native way

Anything else?

No response

@Adam-D-Lewis Adam-D-Lewis added the needs: triage 🚦 Someone needs to have a look at this issue and triage label Apr 26, 2023
@pavithraes pavithraes added type: enhancement 💅🏼 New feature or request needs: investigation 🔍 Someone in the team needs to find the root cause and replicate this bug area: JupyterLab and removed needs: triage 🚦 Someone needs to have a look at this issue and triage labels Apr 27, 2023
@iameskild
Copy link
Member

iameskild commented May 18, 2023

I have been looking into what it would take to integrate Jupyter-Scheduler into Nebari, specifically into Argo-Workflows.

Jupyter-Scheduler enables users to extend their ExecutionManager class which we can take advantage of to submit a workflow running the notebook.

This would require a small new package, perhaps called argo-workflow-executor (or similar) that exposes an ArgoWorkflowExecutor class. To submit workflows, we can use Hera-Workflows (the python client for Argo-Workflow).

This new Executor class can then be pulled into the jupyter_server_config.py as follows:

from argo_workflow_executer import ArgoWorkflowExecutor

c = get_config()

c.Scheduler.execution_manager_class=ArgoWorkflowExecutor

The workflow admission controller improvements Adam has been working on (#1741) is a perfect complement to this solution as it will ensure that the user's conda environments and home directory are copied over to the workflow specs.

@Adam-D-Lewis
Copy link
Member Author

This is tangentially related, but it would be awesome to get the necessary Argo env vars set up by the Kubespawner when the user pod spins up - nebari-dev/nebari-docs#278 (comment) - it'd make the user experience that much better.

@iameskild iameskild self-assigned this May 22, 2023
@iameskild
Copy link
Member

As I've worked on integrating Jupyter-Scheduler into Nebari, I found it to be a little more challenging than originally envisioned. This is due to two or three main factors:

  • It took me a while to recognize that our version of JupyterHub (1.5) spawns the user's server (ie JupyterLab) using jupyterhub-singleuser as opposed to jupyter server; this resulted in the Jupyter-Scheduler not being recognized as a valid extension, similar to this issue. It's my understanding that later versions of JupyterHub use jupyter server so in the meantime, we will need to make the following modifications for this extension to work:

    • We will need to set the following environment variable (through KubeSpawner):
      JUPYTERHUB_SINGLEUSER_APP = jupyter_server.serverapp.ServerApp
    • We will need to rename jupyter_notebook_config.py to jupyter_server_config.py.
  • The more complicated reason why integrating Jupyter-Scheduler into Nebari is tricky has to do with the fact that Jupyter-Scheduler is a JupyterLab extension that runs on each user's JupyterLab pod and it has no awareness of JupyterHub/Argo-Workflows, etc. This means jobs will only run when the user's JupyterLab pod is running which defeats the purpose of integrating such an extension.

    • I believe I am able to get around this by submitting "scheduled jobs" as Cron Workflows but this required deeper modifications than I initially anticipated.
  • The last reason this is more complicated has to do with how Argo-specific environment variables (ie. ARGO_TOKEN, etc) need to be set. Initially, I thought they could be set in the user's running JupyterLab pod. Unfortunately, they will need to be set on the pod prior to launching JupyterLab (and Jupyter-Scheduler).

    • I'm still working on how to most effectively accomplish this. My first thought is to include another initContainer that will pull these environment variables in but where these env vars come from, I still need to investigate.

@iameskild
Copy link
Member

We can add the Argo service account ARGO_TOKEN secret an environment variable on a per user basis (admin, developer, or viewer). This will require some modifications to the Nebari-Workflow-Controller (see PR for those updates).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: JupyterLab needs: investigation 🔍 Someone in the team needs to find the root cause and replicate this bug type: enhancement 💅🏼 New feature or request
Projects
Development

Successfully merging a pull request may close this issue.

3 participants