Skip to content

Commit

Permalink
Merge pull request #473 from yuvipanda/docs
Browse files Browse the repository at this point in the history
Add a readthedocs setup
  • Loading branch information
yuvipanda authored Jul 13, 2023
2 parents ad68f26 + 04a4042 commit 554675c
Show file tree
Hide file tree
Showing 13 changed files with 268 additions and 127 deletions.
17 changes: 17 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Configuration on how ReadTheDocs (RTD) builds our documentation
# ref: https://readthedocs.org/projects/jupyterhub-grafana/
# ref: https://docs.readthedocs.io/en/stable/config-file/v2.html
#
version: 2

build:
os: ubuntu-22.04
tools:
python: "3.11"

sphinx:
configuration: docs/conf.py

python:
install:
- requirements: docs/requirements.txt
131 changes: 4 additions & 127 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
# Pangeo Docker Images

[![Documentation build status](https://img.shields.io/readthedocs/pangeo-docker-images?logo=read-the-docs)](https://pangeo-docker-images.readthedocs.org/en/latest/)
![Build Status](https://github.com/pangeo-data/pangeo-docker-images/workflows/Build/badge.svg)
![Publish Status](https://github.com/pangeo-data/pangeo-docker-images/workflows/Publish/badge.svg)
![DockerHub Version](https://img.shields.io/docker/v/pangeo/base-image?sort=date)

The images defined in this repository capture reproducible computing environments used by [Pangeo Cloud](https://pangeo.io/cloud.html). They build on top of the Ubuntu operating system and include [conda environments](https://conda.io/projects/conda) with a curated set of Python packages for geospatial analysis. While intended for Pangeo Cloud, they can be used outside of Pangeo infrastructure too!
The images defined in this repository capture reproducible computing environments used by [Pangeo Cloud](https://pangeo.io/cloud.html). They build on top of the Ubuntu operating system and include [conda environments](https://conda.io/projects/conda) with a curated set of Python packages for geospatial analysis. While initially intended for Pangeo Cloud, they can be used outside of Pangeo infrastructure too!

More details can be found in [our documentation](https://pangeo-docker-images.readthedocs.io).

Images are hosted on [DockerHub](https://hub.docker.com/u/pangeo) and on [Quay.io](https://quay.io/organization/pangeo)

Expand Down Expand Up @@ -34,132 +37,6 @@ graph TD;
click forge "https://hub.docker.com/r/pangeo/forge" "Open this in a new tab" _blank
```

### How to use the pangeo-notebook image with Binder
A major use-case for these images is running an ephemeral server on the Cloud with BinderHub. Anyone can launch a server running the latest-and-greatest `pangeo-notebook` image with the following URL

* https://mybinder.org/v2/gh/pangeo-data/pangeo-docker-images/HEAD?urlpath=lab

NOTE: the link above resolves to the [`pangeo-notebook` image](https://github.com/pangeo-data/pangeo-docker-images/tree/master/pangeo-notebook) and not `base-notebook`, `ml-notebook` or `pytorch-notebook` that are also defined in this repository. Currently BinderHubs map to a single image definition per repository.

#### Use nbgitpuller to automatically load content

The binder link above will launch Jupyterlab without any notebooks or other content. From Jupyterlab you can then upload notebooks or run `git pull` commands to retrieve content in another GitHub repository. However, it can be very useful to pre-load content when a server launches. [nbgitpuller link generator](https://jupyterhub.github.io/nbgitpuller/link) is very useful for this!

Below is a link to illustrate launching [`pangeo-notebook/2021.09.30`](https://github.com/pangeo-data/pangeo-docker-images/blob/2021.09.30/pangeo-notebook/packages.txt) and automatically pulling the notebooks housed in https://github.com/pangeo-data/cog-best-practices.

* https://mybinder.org/v2/gh/pangeo-data/pangeo-docker-images/2021.09.30?urlpath=git-pull%3Frepo%3Dhttps%253A%252F%252Fgithub.com%252Fpangeo-data%252Fcog-best-practices%26urlpath%3Dlab%252Ftree%252Fcog-best-practices%252F%26branch%3Dmain

#### Customize your environment
Advanced users may want a highly customized environment that still works on Pangeo BinderHubs. You can do that by building off the pangeo `base-image` following our [template repository example](https://github.com/pangeo-data/pangeo-binder-template). Further documentation on the configuration files in the `binder` subfolder can be found in the [repo2docker documentation](https://repo2docker.readthedocs.io/en/latest/config_files.html#configuration-files).

### How to launch Jupyterlab locally with one of these images
```
docker run -it --rm -p 8888:8888 pangeo/pangeo-notebook:latest jupyter lab --ip 0.0.0.0
```
**NOTE:** images are mirrored on quay.io so you can also pull `quay.io/pangeo/pangeo-notebook:latest`

To access files from your local hard drive from within the Docker Jupyterlab, you need to use a Docker [volume mount](https://docs.docker.com/storage/volumes/). The following command will mount your home directory in the docker container and launch the Jupyterlab from there.

```
docker run -it --rm --volume $HOME:$HOME -p 8888:8888 pangeo/pangeo-notebook:latest jupyter lab --ip 0.0.0.0 $HOME
```

You can also run commands other than `jupyter` when starting a Docker container:

```
docker run -it --rm pangeo/base-notebook:2021.09.30 /bin/bash
```

If you're doing Machine Learning and want to use NVIDIA GPUs,
follow the instructions at https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
to install `nvidia-docker`, and then start the Docker container like so:

```
docker run -it --rm --gpus all -p 8888:8888 pangeo/pytorch-notebook:master jupyter lab --ip 0.0.0.0
```

### How to launch an image with a Cloud provider on your own account

Many Cloud providers offer services to run Docker containers in their data centers.
Instructions will vary, so we don't provide specifics here, but as an example,
check out these docs for running containers on the cloud via Docker Compose:

- [Amazon Elastic Container Service (ECS)](https://docs.docker.com/cloud/ecs-integration)
- [Azure Container Instances (ACI)](https://docs.docker.com/cloud/aci-integration)

#### GitHub Codespaces (Azure)

You can launch the pangeo-notebook environment via [GitHub Codespaces](https://github.com/features/codespaces) with this button:

[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/pangeo-data/pangeo-docker-images?quickstart=1)


### How to install just the conda environment

If you're used to managing conda environments on your personal computer, or running a hosted JupyterLab service like [Google Colab](https://colab.research.google.com) or [AWS SageMaker Studio Lab](https://studiolab.sagemaker.aws), you can exactly match a tagged pangeo-notebook conda environment. For example, below we install the `pangeo-notebook` environment tagged on `2021.12.02`:

```
%conda create -n pangeo-notebook --file https://raw.githubusercontent.com/pangeo-data/pangeo-docker-images/2021.12.02/pangeo-notebook/conda-linux-64.lock
```
Note that this will only work on linux environments, since `conda-linux-64.lock` is specific to linux.

### Image tagging and "continuous building"
This repository uses [GitHub Actions](https://help.github.com/en/actions) to build images, run tests, and push images to [DockerHub](https://hub.docker.com/orgs/pangeo).

* Pull requests from forks trigger rebuilding all images

* `pangeo/base-notebook:master` corresponds to current "staging" image in sync with master branch. Built with every commit to master. Also tagged with short GitHub short SHA `pangeo/base-notebook:2639bd3`.

* Tags pushed to GitHub manually represent "production" releases with corresponding tags on DockerHub `pangeo/pangeo-notebook:2020.03.11`. The `latest` tag also corresponds to the most recent GitHub tag.


### How to build images through CI
A common need is to update conda package versions in these images. To do so simply, 1) Fork this repo, 2) edit `pangeo-notebook/environment.yml` on your fork, 3) create a PR. Compatible packages versions with `conda-lock` and a lock file is automatically committed added as a commit in your PR.


### How to build images locally
You'll need at least Conda installed, and Docker if you want to build and test locally.
```
# create a fork of this repo and clone it locally
git clone https://github.com/mygithub/pangeo-docker-images
cd pangeo-docker-images
# Install conda-lock
conda env create -f environment-condalock.yml
git checkout -b change-pangeo-notebook
```

Edit `pangeo-notebook/environment.yml` to change packages! Note that `make pangeo-notebook` is a convenient shortcut to build and test. See the Makefile for specific commands that are run. For example, you can just run conda-lock and don't have to run Docker to build and test locally.
```
make pangeo-notebook
git commit -a -m "added x packages, changed x version"
git push
# go to github to create PR, or use github cli https://cli.github.com
```

### Design:

##### Goals:
1. compatible with [Pangeo BinderHubs](https://github.com/pangeo-data/pangeo-binder) and [JupyterHubs](https://github.com/pangeo-data/pangeo-cloud-federation)
1. compatible with [Repo2Docker Python configuration files](https://repo2docker.readthedocs.io/en/latest/config_files.html)
1. reproducible build process and explicit conda package lists
1. small size, fast build
1. easy to customize

Everything stems from the `Dockerfile` in the `base-image` folder. The `base-image` configures default settings for Conda and Dask with `condarc.yml` and `dask_config.yml` files. The `base-image` is not meant to run on its own, it is the common foundation for `-notebook` images that install Python packages including JupyerLab and lab extensions. Lists of Conda packages for each image are specified in an `environment.yml` in each `-notebook` folder, and compatible Dask and Jupyter packages are guaranteed by specifying the `pangeo-notebook` [conda metapackage](https://github.com/conda-forge/pangeo-notebook-feedstock).

You can pre-solve for compatible environments locally with [conda-lock](https://github.com/mariusvniekerk/conda-lock/blob/master/README.md) to convert the `environment.yml` file to a [conda-linux-64.lock](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#building-identical-conda-environments) file which is an explicit list of compatible packages solved by Conda. The major advantage of doing this is that if you rebuild at a later date the resulting Conda environment is identical, which improves reproducibility. For this reason, when building off of the `base-image`, any existing `conda-linux-64.lock` file takes precedence over the `environment.yml` file.

### Environment

The runtime environment sets two variables by default

1. `$PANGEO_ENV`: name of the conda environment.
2. `$PANGEO_SCRATCH`: a URL like `gcs://pangeo-scratch/username/` that
points to a cloud storage bucket for temporary storage. This is set
if the variable `$PANGEO_SCRATCH_PREFIX` and `JUPYTERHUB_USER`
are detected. The prefix should be like `s3://pangeo-scratch`.
This is not present in the `forge/` image.


### Other notes

Expand Down
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
36 changes: 36 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
project = 'Pangeo Docker Stacks'
copyright = '2023, Pangeo Contributors'
author = 'Pangeo Contributors'


extensions = [
"myst_parser",
]

myst_enable_extensions = [
"deflist",
"colon_fence",
"linkify",
]

source_suffix = [".rst", ".md"]

templates_path = ['_templates']

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']


# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'sphinx_book_theme'

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
37 changes: 37 additions & 0 deletions docs/contributing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Contributing

## How to build images locally

You'll need at least Conda installed, and Docker if you want to build and test locally.

```
# create a fork of this repo and clone it locally
git clone https://github.com/mygithub/pangeo-docker-images
cd pangeo-docker-images
# Install conda-lock
conda env create -f environment-condalock.yml
git checkout -b change-pangeo-notebook
```

Edit `pangeo-notebook/environment.yml` to change packages! Note that `make pangeo-notebook` is a convenient shortcut to build and test. See the Makefile for specific commands that are run. For example, you can just run conda-lock and don't have to run Docker to build and test locally.

```
make pangeo-notebook
git commit -a -m "added x packages, changed x version"
git push
# go to github to create PR, or use github cli https://cli.github.com
```

## How to build images through CI

A common need is to update conda package versions in these images. To do so simply, 1) Fork this repo, 2) edit `pangeo-notebook/environment.yml` on your fork, 3) create a PR. Compatible packages versions with `conda-lock` and a lock file is automatically committed added as a commit in your PR.

## Image tagging and "continuous building"

This repository uses [GitHub Actions](https://help.github.com/en/actions) to build images, run tests, and push images to [DockerHub](https://hub.docker.com/orgs/pangeo).

* Pull requests from forks trigger rebuilding all images

* `pangeo/base-notebook:master` corresponds to current "staging" image in sync with master branch. Built with every commit to master. Also tagged with short GitHub short SHA `pangeo/base-notebook:2639bd3`.

* Tags pushed to GitHub manually represent "production" releases with corresponding tags on DockerHub `pangeo/pangeo-notebook:2020.03.11`. The `latest` tag also corresponds to the most recent GitHub tag.
8 changes: 8 additions & 0 deletions docs/howto/conda-env.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# How to install just the conda environment

If you're used to managing conda environments on your personal computer, or running a hosted JupyterLab service like [Google Colab](https://colab.research.google.com) or [AWS SageMaker Studio Lab](https://studiolab.sagemaker.aws), you can exactly match a tagged pangeo-notebook conda environment. For example, below we install the `pangeo-notebook` environment tagged on `2021.12.02`:

```
conda create -n pangeo-notebook --file https://raw.githubusercontent.com/pangeo-data/pangeo-docker-images/2021.12.02/pangeo-notebook/conda-linux-64.lock
```
Note that this will only work on linux environments, since `conda-linux-64.lock` is specific to linux.
3 changes: 3 additions & 0 deletions docs/howto/custom-image.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Build a custom image

Advanced users may want a highly customized environment that still works on Pangeo BinderHubs. You can do that by building off the pangeo `base-image` following our [template repository example](https://github.com/pangeo-data/pangeo-binder-template). Further documentation on the configuration files in the `binder` subfolder can be found in the [repo2docker documentation](https://repo2docker.readthedocs.io/en/latest/config_files.html#configuration-files).
59 changes: 59 additions & 0 deletions docs/howto/launch.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# How to launch a notebook using these images

## How to use the `pangeo-notebook` image with Binder

A major use-case for these images is running an ephemeral server on the Cloud with BinderHub. Anyone can launch a server running the latest-and-greatest `pangeo-notebook` image with the following URL

* https://mybinder.org/v2/gh/pangeo-data/pangeo-docker-images/HEAD?urlpath=lab

NOTE: the link above resolves to the [`pangeo-notebook` image](https://github.com/pangeo-data/pangeo-docker-images/tree/master/pangeo-notebook) and not `base-notebook`, `ml-notebook` or `pytorch-notebook` that are also defined in this repository. Currently BinderHubs map to a single image definition per repository.

### Use nbgitpuller to automatically load content

The binder link above will launch Jupyterlab without any notebooks or other content. From Jupyterlab you can then upload notebooks or run `git pull` commands to retrieve content in another GitHub repository. However, it can be very useful to pre-load content when a server launches. [nbgitpuller link generator](https://jupyterhub.github.io/nbgitpuller/link) is very useful for this!

Below is a link to illustrate launching [`pangeo-notebook/2021.09.30`](https://github.com/pangeo-data/pangeo-docker-images/blob/2021.09.30/pangeo-notebook/packages.txt) and automatically pulling the notebooks housed in https://github.com/pangeo-data/cog-best-practices.

* https://mybinder.org/v2/gh/pangeo-data/pangeo-docker-images/2021.09.30?urlpath=git-pull%3Frepo%3Dhttps%253A%252F%252Fgithub.com%252Fpangeo-data%252Fcog-best-practices%26urlpath%3Dlab%252Ftree%252Fcog-best-practices%252F%26branch%3Dmain

## How to launch Jupyterlab locally with one of these images

```
docker run -it --rm -p 8888:8888 pangeo/pangeo-notebook:latest jupyter lab --ip 0.0.0.0
```
**NOTE:** images are mirrored on quay.io so you can also pull `quay.io/pangeo/pangeo-notebook:latest`

To access files from your local hard drive from within the Docker Jupyterlab, you need to use a Docker [volume mount](https://docs.docker.com/storage/volumes/). The following command will mount your home directory in the docker container and launch the Jupyterlab from there.

```
docker run -it --rm --volume $HOME:$HOME -p 8888:8888 pangeo/pangeo-notebook:latest jupyter lab --ip 0.0.0.0 $HOME
```

You can also run commands other than `jupyter` when starting a Docker container:

```
docker run -it --rm pangeo/base-notebook:2021.09.30 /bin/bash
```

If you're doing Machine Learning and want to use NVIDIA GPUs,
follow the instructions at https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
to install `nvidia-container-toolkit`, and then start the Docker container like so:

```
docker run -it --rm --gpus all -p 8888:8888 pangeo/pytorch-notebook:master jupyter lab --ip 0.0.0.0
```

## How to launch an image with a Cloud provider on your own account

Many Cloud providers offer services to run Docker containers in their data centers.
Instructions will vary, so we don't provide specifics here, but as an example,
check out these docs for running containers on the cloud via Docker Compose:

- [Amazon Elastic Container Service (ECS)](https://docs.docker.com/cloud/ecs-integration)
- [Azure Container Instances (ACI)](https://docs.docker.com/cloud/aci-integration)

## GitHub Codespaces (Azure)

You can launch the pangeo-notebook environment via [GitHub Codespaces](https://github.com/features/codespaces) with this button:

[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/pangeo-data/pangeo-docker-images?quickstart=1)
25 changes: 25 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Welcome to Pangeo Docker Images documentation!

![Build Status](https://github.com/pangeo-data/pangeo-docker-images/workflows/Build/badge.svg)
![Publish Status](https://github.com/pangeo-data/pangeo-docker-images/workflows/Publish/badge.svg)
![DockerHub Version](https://img.shields.io/docker/v/pangeo/base-image?sort=date)

Curated sets of docker images for doing earth data analysis!

# How do I...?

```{toctree}
:maxdepth: 2
howto/launch.md
howto/custom-image.md
howto/conda-env.md
contributing.md
```

# Topic explainers

```{toctree}
:maxdepth: 2
topic/design.md
topic/environment-variables.md
```
Loading

0 comments on commit 554675c

Please sign in to comment.