Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archiving all-files scheduler #388

Merged
merged 17 commits into from
Aug 11, 2023

Conversation

JasonWeill
Copy link
Collaborator

@JasonWeill JasonWeill commented Jun 28, 2023

WIP fix for #349.

Creates an AllFilesArchivingExecutionManager and AllFilesArchivingScheduler class, with the intention of archiving all output files and side effects on job runs.

You can use these classes by running:

jupyter lab \
  --SchedulerApp.scheduler_class=jupyter_scheduler.scheduler.AllFilesArchivingScheduler \
  --Scheduler.execution_manager_class=jupyter_scheduler.executors.AllFilesArchivingExecutionManager

After the user chooses to download output files from a job, the side effect files are unpacked to wherever the first output file is located.

In the screen shot below, after downloading the output of a job that ran writefile.ipynb, which produces the side effect file pun.txt, the side effect file is in the jobs subdirectory.

image

@github-actions
Copy link
Contributor

Binder 👈 Launch a Binder on branch JasonWeill/jupyter-scheduler/all-files-scheduler

@dlqqq dlqqq added the enhancement New feature or request label Aug 8, 2023
Copy link
Collaborator

@dlqqq dlqqq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JasonWeill Thank you for working on this! Left some feedback.

jupyter_scheduler/job_files_manager.py Outdated Show resolved Hide resolved
jupyter_scheduler/job_files_manager.py Outdated Show resolved Hide resolved
jupyter_scheduler/job_files_manager.py Outdated Show resolved Hide resolved
jupyter_scheduler/job_files_manager.py Outdated Show resolved Hide resolved
jupyter_scheduler/executors.py Outdated Show resolved Hide resolved
jupyter_scheduler/executors.py Outdated Show resolved Hide resolved
jupyter_scheduler/job_files_manager.py Outdated Show resolved Hide resolved
jupyter_scheduler/executors.py Outdated Show resolved Hide resolved
jupyter_scheduler/scheduler.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@dlqqq dlqqq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also a failing unit test blocking CI:

FAILED jupyter_scheduler/tests/test_job_files_manager.py::test_downloader_download[output_formats1-output_filenames1-staging_paths1-/home/runner/work/jupyter-scheduler/jupyter-scheduler/jupyter_scheduler/tests/test_files_output-False] - AssertionError: assert False
 +  where False = <function exists at 0x7feb182f7420>('/home/runner/work/jupyter-scheduler/jupyter-scheduler/jupyter_scheduler/tests/test_files_output/helloworld-out.ipynb')
 +    where <function exists at 0x7feb182f7420> = <module 'posixpath' (frozen)>.exists
 +      where <module 'posixpath' (frozen)> = os.path

No need for an urgent fix here, since CI is failing on main anyways until we merge #417. However, this definitely should be fixed after rebasing and before merging.

@JasonWeill
Copy link
Collaborator Author

Simplifies archiving and unarchiving logic per @dlqqq. Seeking external feedback about having these new classes replace the old Archiving* classes; if there's no opposition, I can work on this tomorrow. Also, we need to document these new classes and how to use them.

@JasonWeill JasonWeill changed the title WIP: Archiving all-files scheduler Archiving all-files scheduler Aug 11, 2023
@JasonWeill JasonWeill marked this pull request as ready for review August 11, 2023 18:09
Copy link
Collaborator

@dlqqq dlqqq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! 🎉

@JasonWeill JasonWeill merged commit b897aa5 into jupyter-server:main Aug 11, 2023
3 of 5 checks passed
JasonWeill added a commit to JasonWeill/jupyter-scheduler that referenced this pull request Aug 11, 2023
* Fix typo in comment

* WIP: Adds new scheduler

* writes individual files

* WIP: Write zip file

* WIP: Trying to get zip file to be written only on scheduled job runs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* WIP: Removes zip type, incremental work for archiving work dir

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Create tar.gz in staging subdir

* Capture side effect files in staging dir

* Extracts files

* Add filter

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update jupyter_scheduler/job_files_manager.py

Co-authored-by: david qiu <[email protected]>

* Simplifies cleanup logic

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updates docs, deletes old Archiving*, renames AllFilesArchiving

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: david qiu <[email protected]>
@andrii-i andrii-i added this to the 2.1.0 milestone Aug 14, 2023
dlqqq added a commit that referenced this pull request Aug 15, 2023
* Archiving all-files scheduler (#388)

* Fix typo in comment

* WIP: Adds new scheduler

* writes individual files

* WIP: Write zip file

* WIP: Trying to get zip file to be written only on scheduled job runs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* WIP: Removes zip type, incremental work for archiving work dir

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Create tar.gz in staging subdir

* Capture side effect files in staging dir

* Extracts files

* Add filter

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update jupyter_scheduler/job_files_manager.py

Co-authored-by: david qiu <[email protected]>

* Simplifies cleanup logic

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updates docs, deletes old Archiving*, renames AllFilesArchiving

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: david qiu <[email protected]>

* Avoids option compatible only with Python 3.11

* Fix JFM tests (#424)

* fix JFM tests

* pre-commit

* add minor comment

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: david qiu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants