Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document how to run pytorch-notebook docker container #316

Merged
merged 3 commits into from
Apr 28, 2022
Merged

Document how to run pytorch-notebook docker container #316

merged 3 commits into from
Apr 28, 2022

Conversation

weiji14
Copy link
Member

@weiji14 weiji14 commented Apr 28, 2022

Follow up of #315 to document how to run the pytorch-notebook docker image.

Output of docker run -it --rm --gpus all pangeo/pytorch-notebook:master nvidia-smi should show something like this:

Thu Apr 28 02:21:34 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.60.02    Driver Version: 510.60.02    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A500...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   51C    P3    24W /  N/A |      5MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2574      G                                       4MiB |
+-----------------------------------------------------------------------------+

TODO update https://pangeo-data.github.io/pangeo-stacks/images.html too?

@github-actions
Copy link
Contributor

Binder 👈 Try on Mybinder.org!
Binder 👈 Try on Pangeo GCP Binder!
Binder 👈 Try on Pangeo AWS Binder!

@scottyhq
Copy link
Member

TODO update https://pangeo-data.github.io/pangeo-stacks/images.html too?

Good point! I opened #319 to track

how to run the pytorch-notebook docker image.

I think adding this in pytorch-notebook/readme.md would be great for now. Some information about required hardware would be good (maybe just links to the NVIDIA docs from #315).

For those that don't have a local GPU one of the easiest ways I've found to run a Docker container with Pytorch on a GPU is via Azure Container Instances https://github.com/Denolle-Lab/azure/tree/main/aci_plus_volume, not sure if it's worth adding some documentation on that? Currently we just have https://github.com/pangeo-data/pangeo-docker-images#how-to-launch-an-image-with-a-cloud-provider-on-your-own-account

@weiji14
Copy link
Member Author

weiji14 commented Apr 28, 2022

how to run the pytorch-notebook docker image.

I think adding this in pytorch-notebook/readme.md would be great for now. Some information about required hardware would be good (maybe just links to the NVIDIA docs from #315).

Hmm yes, nvidia-docker isn't exactly a trivial install. I thought to keep it in the main README.md because this will be needed for the tensorflow docker image as well, and to be fair, the guide at https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html seems pretty good already and will be better maintained than here.

For those that don't have a local GPU one of the easiest ways I've found to run a Docker container with Pytorch on a GPU is via Azure Container Instances https://github.com/Denolle-Lab/azure/tree/main/aci_plus_volume, not sure if it's worth adding some documentation on that? Currently we just have https://github.com/pangeo-data/pangeo-docker-images#how-to-launch-an-image-with-a-cloud-provider-on-your-own-account

It looks quite involved, but I suppose we could copy some stuff from that. Ideally there would be a one-click GPU-enabled binder link (behind a login of course). I know that microsoft/torchgeo#316 made a 'Open on Planetary Computer' button like Open on Planetary Computer, but this won't be pulling the pangeo/pytorch-notebook docker image, just the files from git.

@scottyhq
Copy link
Member

Ideally there would be a one-click GPU-enabled binder link (behind a login of course)

Agreed! Not sure if it's in scope for the upcoming revamped pangeo-binder (2i2c-org/infrastructure#919).

Happy to merge this as is if you'd like.

@weiji14 weiji14 marked this pull request as ready for review April 28, 2022 17:53
Copy link
Member

@scottyhq scottyhq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@scottyhq scottyhq merged commit b175aed into pangeo-data:master Apr 28, 2022
@weiji14 weiji14 deleted the doc/nvidia-docker branch April 28, 2022 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants