-
Notifications
You must be signed in to change notification settings - Fork 17
Getting started on Della
-
Make sure you use the VPN
-
Run
ssh <username>@[email protected]
Type checkquota
to check quota.
Limited disk space in home directory, environments to be created in /scratch/gpfs
cd /scratch/gpfs
mkdir $USER
$USER
is an environment variable that holds your user name (you do not have to replace it).
export MAMBA_ROOT_PREFIX=/scratch/gpfs/$USER/micromamba
curl -Ls https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -xvj bin/micromamba
eval "$(./bin/micromamba shell hook -s posix)"
./bin/micromamba shell init -s bash -p /scratch/gpfs/$USER/micromamba/
Let's check
echo $MAMBA_ROOT_PREFIX
should point to /scratch/gpfs/$USER/micromamba
cd
git clone https://github.com/gnn-tracking/gnn_tracking.git
cd gnn_tracking/environments
micromamba env create --name gnn --file default.yml
cd ..
# you should be in the top dir of the gnn_tracking package now
pytest
If you haven't done already, install the software (first sections of this document). If you are using the global anaconda installation instead of micromamba, you can skip to the next section.
Some context for what we're doing (**Click me**)
If we had used the global anaconda installation for our environment rather than micromamba, then everything would have worked out of the box. However, for micromamba we have to do a small hack so that the web GUI finds our installation, namely we have to modify our [`$PATH`](https://www.digitalocean.com/community/tutorials/how-to-view-and-update-the-linux-path-environment-variable) environment variable. However, from the web GUI, the only thing that allows us to do this is to load a custom [environment modules](https://modules.sourceforge.net/). So we have to write our own small module.- Change directories to your scratch directory:
cd /scratch/gpfs/$USER
- Create a new file called
micromamba-module
:touch micromamba-module
- Open an editor, e.g. with
nano micromamba-module
- Add the following lines:
#%Module1.0
module-whatis "Set PATH for my mamba env"
prepend-path PATH "/scratch/gpfs/kl5675/micromamba/envs/gnn/bin/"
In the last line, replace kl5675
with your user name (assuming that you named your environment gnn
).
Let's double check a few things ($
is used to separate input and output):
$ file /scratch/gpfs/kl5675/micromamba/envs/gnn/bin/python
/scratch/gpfs/kl5675/micromamba/envs/gnn/bin/: symbolic link to python3.10
If the python version is different that's fine (but should be >3.10
).
Let's do another check: Type
$ module load $PWD/micromamba-module
$ which python
/scratch/gpfs/kl5675/micromamba/envs/gnn/bin/python
(the first command requires the absolute path to the module file, hence we prefix the path to the current working directory with $PWD
).
The output of the second command should point to the python
file in your micromamba environment.
Finally, let's make sure that jupyter lab
is available.
micromamba activate gnn
jupyter lab
If you see the jupyter lab starting (lot's of colorful output), then it's already installed. You can quit it by hitting Ctrl-C
a bunch of times.
If you instead only see a help message that ends with
Jupyter command
jupyter-lab
not found.
then do
micromamba install -c conda-forge jupyterlab
to install it.
OK, we're all set :)
- Connect to the Princeton VPN
- Go to https://mydella.princeton.edu/
- From the top bar, choose "Interactive apps" > "Jupyter"
You need to enter the following information:
- Number of hours: You decide. Your job will be killed after the time is up, but the fewer hours you request, the less you need to wait initially.
- Custom partition: default
- Node type: any
- Number of cores: depends on what you're doing
- Memory allocated: depends on what you're doing
- Anaconda 3 version: custom
- Custom environment module paths:
/scratch/gpfs/kl5675/
where you replacekl5675
with your own username - Modules to load instead of the default:
micromamba-module
- Extra SLURM options:
--gres=gpu:1
. If you require 80GB GPUs instead of 40GBs, add--constraint=gpu80
(also see the last section of this document) - How to handle conda environments from your home directory: Only use those conda envs that already have ipykernel installed
- Click 'Launch'
The status will first be "Queued", then changing to "Starting", then "Running".
- Click the "Connect to Jupyter" button.
- Navigate to an existing notebook or click "New" > "Python3 (ipykernel)". Do not choose one of the anaconda options.
- You should now have a Jupyter notebook in your environment.
Let's check that everything is working: Type
# check if import is working
import gnn_tracking
# check that we have a GPU
import torch
# should show True
torch.cuda.is_available()
After you finish your calculations, go back to the della web portal and click "Delete" for the running Jupyter session.
You can access logs for the SLURM job that was created under the hood from the web GUI:
To do this, click on the link after "Session ID". You'll see a file browser. Click on output.log
for the log. If this file doesn't exist yet, your job probably hasn't started yet (it's still queued). Whenever something is going wrong, you should copy this log and attach it to your report.
After you leave the "Interactive app" page, you will no longer find this link. However, the files are in your home directory on della
. For example at /home/$USER/ondemand/data/sys/dashboard/batch_connect/sys/jupyter/output/dd4824c3-866d-4cbe-b502-a004d610bfcc
(the hashes will be different).
Please also include the files user_defined_context.json
and job_script_options.json
in your report.
This is how the log looks if it is successful.
Script starting...
Waiting for Jupyter server to open port 19096...
Starting main script...
TTT - Sat Jul 1 20:47:24 EDT 2023
Creating env launcher wrapper script...
Creating launcher wrapper script...
TTT - Sat Jul 1 20:47:24 EDT 2023
Creating custom Jupyter kernels...
cp: cannot stat '/home/kl5675/ondemand/data/sys/dashboard/batch_connect/sys/jupyter/output/65d258e3-ad5c-4d45-9ef5-b433fc3934dd/assets/python_/*': No such file or directory
TTT - Sat Jul 1 20:47:24 EDT 2023
Creating custom Jupyter kernels from user-created Conda environments...
Creating kernel for /home/kl5675/.conda/envs/*/...
EnvironmentLocationNotFound: Not a conda environment: /home/kl5675/.conda/envs/*
TTT - Sat Jul 1 20:47:24 EDT 2023
Creating custom Jupyter kernels from local anaconda installations...
Currently Loaded Modulefiles:
1) micromamba-module
TTT - Sat Jul 1 20:47:24 EDT 2023
+ jupyter kernelspec list
Available kernels:
sys_python27 /home/kl5675/ondemand/data/sys/dashboard/batch_connect/sys/jupyter/output/65d258e3-ad5c-4d45-9ef5-b433fc3934dd/share/jupyter/kernels/sys_python27
sys_python36 /home/kl5675/ondemand/data/sys/dashboard/batch_connect/sys/jupyter/output/65d258e3-ad5c-4d45-9ef5-b433fc3934dd/share/jupyter/kernels/sys_python36
sys_python37 /home/kl5675/ondemand/data/sys/dashboard/batch_connect/sys/jupyter/output/65d258e3-ad5c-4d45-9ef5-b433fc3934dd/share/jupyter/kernels/sys_python37
sys_python37_2 /home/kl5675/ondemand/data/sys/dashboard/batch_connect/sys/jupyter/output/65d258e3-ad5c-4d45-9ef5-b433fc3934dd/share/jupyter/kernels/sys_python37_2
sys_python38 /home/kl5675/ondemand/data/sys/dashboard/batch_connect/sys/jupyter/output/65d258e3-ad5c-4d45-9ef5-b433fc3934dd/share/jupyter/kernels/sys_python38
python3 /scratch/gpfs/kl5675/micromamba/envs/gnn/share/jupyter/kernels/python3
TTT - Sat Jul 1 20:47:28 EDT 2023
+ jupyter notebook --config=/home/kl5675/ondemand/data/sys/dashboard/batch_connect/sys/jupyter/output/65d258e3-ad5c-4d45-9ef5-b433fc3934dd/config.py
[I 20:47:30.006 NotebookApp] Serving notebooks from local directory: /
[I 20:47:30.006 NotebookApp] Jupyter Notebook 6.5.4 is running at:
[I 20:47:30.006 NotebookApp] http://della-l09g2:19096/node/della-l09g2/19096/
[I 20:47:30.006 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
Discovered Jupyter server listening on port 19096!
Generating connection YAML file...
[I 20:47:46.288 NotebookApp] 302 POST /node/della-l09g2/19096/login (172.17.2.9) 1.220000ms
[I 20:47:46.401 NotebookApp] 302 GET /node/della-l09g2/19096/ (172.17.2.9) 0.460000ms
[W 20:48:56.752 NotebookApp] Notebook home/kl5675/Documents/23/git_sync/tutorials/notebooks/009_build_graphs_ml.ipynb is not trusted
[I 20:48:57.493 NotebookApp] Kernel started: 10d06d3a-1927-423f-ba19-4f5e27036228, name: python3
Check out the simpler version above
- [your machine] Connect to the Princeton VPN
- Think of a random number between 6000 and 9000. We’ll use 8945 (but yours MUST be different)
- [your machine] Log in to della-gpu:
ssh -L 8945:localhost:8945 [email protected]
- [della-gpu] Start tmux by typing
tmux
- [della-gpu] Split window in two with
Ctrl-B %
- [della-gpu] In the left window, allocate resources with
salloc --nodes=1 --ntasks=1 --time=01:00:00 --cpus-per-task=1
; this is going to log you into a compute node. It’s going to tell you its name, e.g.,della-r4c4n13
(we’ll need that later) - [della-r4c4n13]
micromamba activate gnn
- [della-r4c4n13]
jupyter notebook --port 8945 --no-browser
; Double check that in the URLs that are displayed you see localhost:8945 - [della-gpu] in the right window, type
ssh -N -L 8945:localhost:8945 della-r4c4n13
(the name from before) - [your machine] Open your browser and copy the link that is shown in step 8
This page has all the information on the della cluster.
In particular, you will need to modify the above command as follows. To get a 40GB OR 80GB node:
salloc --nodes=1 --ntasks=1 --time=01:00:00 --cpus-per-task=1 --gres=gpu:1
to get a node with 80GB:
salloc --nodes=1 --ntasks=1 --time=01:00:00 --cpus-per-task=1 --gres=gpu:1 --constraint=gpu80