Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log notebooks with MLFlow #493

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from
Draft

Conversation

andrii-i
Copy link
Collaborator

@andrii-i andrii-i commented Feb 27, 2024

Prototype of MLFlow - Scheduler integration:

  • Add an option to "Log with MLFlow" during job / job definition creation. When activated, input notebooks "Run now" / single jobs are logged as single experiment with a single run, "Scheduled notebooks" / job definitions are logged as an experiment with a run for every job
  • Modify models, scheduler, and executors logic to accommodate such logging
  • Add "Open in MLFlow" button to detail view that opens a run (for a job) or experiment (for a job definition) in MLFLow Tracker UI where logged artifacts (input notebook, outputs) can be previewed
  • Notebook cells tagged with mlflow_log (can be done via metadata editor) are logged as separate artifacts of the appropriate format (image, pdf, text, html, markdown) when notebook is ran with MLFlow logging enabled. mlflow_log tag logs both input (cell content) and output (result of running the cell), mlflow_log_input tag logs input only, mlflow_log_output tag logs output only. Note that MLFlow UI can only preview image, pdf, text, and html files.

To install and use, clone this PR/andrii-i:mlflow branch and follow the development install steps form the jupyter-scheduler readthedocs.

Beyond the scope of this prototype:

  • MLFlow could be used to replace existing Job files management functionality of the Scheduler
  • Possibility to make MLFlow be able to run notebooks: MLFlow has python_function abstraction that tells it how to run models (essentially, bundles of files). This could be potentially leveraged to package notebooks and dependencies as models and run them directly in MLFlow via python_function abstraction.

Screenshot 2024-02-27 at 10 09 15 AM copy
Screenshot 2024-02-27 at 10 09 32 AM copy
Screenshot 2024-03-05 at 10 50 51 AM
Screenshot 2024-03-05 at 11 02 28 AM

@andrii-i andrii-i added the enhancement New feature or request label Feb 27, 2024
@andrii-i andrii-i force-pushed the mlflow branch 3 times, most recently from aa2e878 to 317d4cc Compare February 27, 2024 18:43
@andrii-i andrii-i force-pushed the mlflow branch 2 times, most recently from bbded11 to 0a3abe0 Compare May 15, 2024 00:23
@andrii-i andrii-i closed this May 15, 2024
@andrii-i andrii-i reopened this Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant