Skip to content

Commit

Permalink
Merge pull request #14 from theovincent/clean_exp
Browse files Browse the repository at this point in the history
Clean exp
  • Loading branch information
theovincent authored Jan 26, 2023
2 parents c7fcfa2 + 99246ea commit 7f3bbe4
Show file tree
Hide file tree
Showing 209 changed files with 8,595 additions and 13,993 deletions.
10 changes: 10 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,19 @@ Dockerfile
**/*.pdf

# Avoid pushing experiments output
experiments/lqr/figures
experiments/chain_walk/figures
experiments/car_on_hill/figures
experiments/bicycle_offline/figures
experiments/bicycle_online/figures
experiments/acrobot/figures
experiments/lunar_lander/figures
**/*.out

# Save optimal value functions of car on hill
!experiments/chain_walk/figures/data/optimal/Q.npy
!experiments/chain_walk/figures/data/optimal/V.npy
!experiments/lqr/figures/data/optimal/W.npy
!experiments/lqr/figures/data/optimal/V.npy
!experiments/car_on_hill/figures/data/optimal/Q.npy
!experiments/car_on_hill/figures/data/optimal/V.npy
13 changes: 10 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -145,12 +145,19 @@ dmypy.json
**/*.pdf

# Avoid pushing experiments output
experiments/bicycle/figures
experiments/lqr/figures
experiments/chain_walk/figures
experiments/car_on_hill/figures
experiments/bicycle_offline/figures
experiments/bicycle_online/figures
experiments/acrobot/figures
experiments/lunar_lander/figures
experiments/lunar_lander/transfer_from_ias_cluster.sh
**/*.out

# Save optimal value functions of car on hill
# Save optimal values
!experiments/chain_walk/figures/data/optimal/Q.npy
!experiments/chain_walk/figures/data/optimal/V.npy
!experiments/lqr/figures/data/optimal/W.npy
!experiments/lqr/figures/data/optimal/V.npy
!experiments/car_on_hill/figures/data/optimal/Q.npy
!experiments/car_on_hill/figures/data/optimal/V.npy
161 changes: 9 additions & 152 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,188 +2,45 @@

## User installation
### Without Docker, with Python 3.8 or 3.9 installed
In the folder where the code is, create a Python virtual environment, activate it and install the package and its dependencies in editable mode:
In the folder where the code is, create a Python virtual environment, activate it, update pip and install the package and its dependencies in editable mode:
```bash
python3 -m venv env
source env/bin/activate
pip install --upgrade pip
pip install -e .
```

### With Docker
Please see the [README](./docker/README.md) file made for that.

## Run the experiments
For an `environment` and an `algorithm`, a jupyter notebook running the associated the experience can be found at _experiments/[environment]/[algorithm].ipynb_.

For example, the jupyter notebook _experiments/chain_walk/PBO_linear.ipynb_ trains a linear PBO on the Chain-Walk environment.

To generate the plots with $N$ seeds with $K$ Bellman iterations, you first need to generate the data by running `./experiments/[environment]/run_seeds.sh -n_seeds N -n_bellman_iteration K`
and then run the jupyter notebook _experiments/[environment]/plots.ipynb_.

### Replicate figures
Figure 4a with one seed, run
```Bash
./experiments/chain_walk/run_seeds.sh --n_seeds 1 --n_bellman_iterations 5
jupyter nbconvert --to notebook --inplace --execute experiments/chain_walk/plots.ipynb
```
You will find Figure 4a at _experiments/chain_walk/figures/distance_to_optimal_V_5.pdf_. The code should take around 5 minutes to run.

Figure 4b with one seed, run
```Bash
./experiments/lqr/run_seeds.sh --n_seeds 1 --n_bellman_iterations 2
jupyter nbconvert --to notebook --inplace --execute experiments/lqr/plots.ipynb
```
You will find Figure 4b at _experiments/lqr/figures/distance_to_optimal_Pi_2.pdf_. The code should take around 2 minutes to run.

Figure 5a with one seed, run
```Bash
./experiments/car_on_hill/run_seeds.sh --n_seeds 1 --n_bellman_iterations 9
jupyter nbconvert --to notebook --inplace --execute experiments/car_on_hill/samples.ipynb
jupyter nbconvert --to notebook --inplace --execute experiments/car_on_hill/plots.ipynb
```
You will find Figure 5a at _experiments/car_on_hill/figures/distance_to_optimal_V_9.pdf_. The code should take around 30 minutes to run.
All the experiments can be ran the same way by simply replacing the name of the environment, here is an example for LQR.

Figure 5b with one seed, run
The following command line runs the training and the evaluation of all the algorithms, one after the other:
```Bash
./experiments/bicycle/run_seeds_FQI.sh --n_seeds 1 --n_bellman_iterations 8
./experiments/bicycle/run_seeds_PBO_deep.sh --n_seeds 1 --n_bellman_iterations 8
./experiments/bicycle/run_seeds_PBO_linear.sh --n_seeds 1 --n_bellman_iterations 8
jupyter nbconvert --to notebook --inplace --execute experiments/bicycle/plots.ipynb
launch_job/lqr/launch_local.sh --experiment_name test --max_bellman_iterations 3 --first_seed 1 --last_seed 1
```
You will find Figure 5b at _experiments/bicycle/figures/seconds_8.pdf_. The code should take around 3 hours to run.

If any problem is encountered, make sure your files match the [file organization](#file-organization) and that the parameters _experiments/[environment]/plots.ipynb_ are matching the data that has been computed so far.
The expected time to finish the runs is 1 minute.

## Run Car On Hill
```Bash
car_on_hill_sample # to collect the offline dataset
car_on_hill_fqi -b 9 -s 1 # to train and evaluate FQI
car_on_hill_pbo -a linear -b 9 -s 1 # to train a linear PBO
car_on_hill_pbo_evaluate -a linear -b 9 -s 1 # to evaluate it
car_on_hill_pbo -a deep -b 9 -s 1 # to train a deep PBO
car_on_hill_pbo_evaluate -a deep -b 9 -s 1 # to evaluate it
```
Once all the trainings are done, you can generate the figures shown in the paper by running the jupyter notebook file located at *experiments/lqr/plots.ipynb*. In the first cell of the notebook, please make sure to change the *experiment_name*, the *max_bellman_iterations* and the *seeds* according to the training that you have ran. You can also have a look at the loss of the training thought the jupyter notebook under *experiments/lqr/plots_loss.ipynb*.

## Run the tests
Run all tests with
```Bash
pytest
```
The code should take around 1 minutes to run.
The code should take around 1 minute to run.

## File organization
```
📦PBO
┣ 📂env # environment files
┣ 📂experiments # files to run the experiments
┃ ┣ 📂bicycle
┃ ┃ ┣ 📂figures
┃ ┃ ┃ ┣ 📂data
┃ ┃ ┃ ┃ ┣ 📂FQI
┃ ┃ ┃ ┃ ┃ ┗ 📜...npy # data generated by running the experiments
┃ ┃ ┃ ┃ ┣ 📂PBO_linear
┃ ┃ ┃ ┃ ┃ ┗ 📜...npy
┃ ┃ ┃ ┃ ┣ 📂PBO_linear_max_linear
┃ ┃ ┃ ┃ ┃ ┗ 📜...npy
┃ ┃ ┃ ┃ ┣ 📂PBO_optimal
┃ ┃ ┃ ┃ ┃ ┗ 📜...npy
┃ ┃ ┃ ┃ ┗ 📂optimal
┃ ┃ ┃ ┃ ┗ 📜...npy
┃ ┃ ┃ ┗ 📜...pdf # plots of the experiments generated from plots.ipynb
┃ ┃ ┣ 📜FQI.ipynb
┃ ┃ ┣ 📜PBO_linear_max_linear.ipynb
┃ ┃ ┣ 📜PBO_linear.ipynb
┃ ┃ ┣ 📜parameters.json
┃ ┃ ┣ 📜plots.ipynb
┃ ┃ ┗ 📜run_seeds.sh
┃ ┣ 📂car_on_hill
┃ ┃ ┣ 📂figures
┃ ┃ ┃ ┣ 📂data
┃ ┃ ┃ ┃ ┣ 📂FQI
┃ ┃ ┃ ┃ ┃ ┗ 📜...npy
┃ ┃ ┃ ┃ ┣ 📂PBO_linear
┃ ┃ ┃ ┃ ┃ ┗ 📜...npy
┃ ┃ ┃ ┃ ┣ 📂PBO_linear_max_linear
┃ ┃ ┃ ┃ ┃ ┗ 📜...npy
┃ ┃ ┃ ┃ ┣ 📂PBO_optimal
┃ ┃ ┃ ┃ ┃ ┗ 📜...npy
┃ ┃ ┃ ┃ ┗ 📂optimal
┃ ┃ ┃ ┃ ┗ 📜...npy
┃ ┃ ┃ ┗ 📜...pdf
┃ ┃ ┣ 📜FQI.ipynb
┃ ┃ ┣ 📜PBO_linear_max_linear.ipynb
┃ ┃ ┣ 📜PBO_linear.ipynb
┃ ┃ ┣ 📜optimal.py
┃ ┃ ┣ 📜parameters.json
┃ ┃ ┣ 📜plots.ipynb
┃ ┃ ┗ 📜run_seeds.sh
┃ ┣ 📂chain_walk
┃ ┃ ┣ 📂figures
┃ ┃ ┃ ┣ 📂data
┃ ┃ ┃ ┃ ┣ 📂FQI
┃ ┃ ┃ ┃ ┃ ┗ 📜...npy
┃ ┃ ┃ ┃ ┣ 📂LSPI
┃ ┃ ┃ ┃ ┃ ┗ 📜...npy
┃ ┃ ┃ ┃ ┣ 📂PBO_linear
┃ ┃ ┃ ┃ ┃ ┗ 📜...npy
┃ ┃ ┃ ┃ ┣ 📂PBO_max_linear
┃ ┃ ┃ ┃ ┃ ┗ 📜...npy
┃ ┃ ┃ ┃ ┣ 📂PBO_optimal
┃ ┃ ┃ ┃ ┃ ┗ 📜...npy
┃ ┃ ┃ ┃ ┗ 📂optimal
┃ ┃ ┃ ┃ ┃ ┗ 📜...npy
┃ ┃ ┃ ┗ 📜...pdf
┃ ┃ ┣ 📜FQI.ipynb
┃ ┃ ┣ 📜LSPI.ipynb
┃ ┃ ┣ 📜PBO_linear.ipynb
┃ ┃ ┣ 📜PBO_max_linear.ipynb
┃ ┃ ┣ 📜PBO_optimal.ipynb
┃ ┃ ┣ 📜optimal.ipynb
┃ ┃ ┣ 📜parameters.json
┃ ┃ ┣ 📜plots.ipynb
┃ ┃ ┗ 📜run_seeds.sh
┃ ┗ 📂lqr
┃ ┣ 📂figures
┃ ┃ ┣ 📂data
┃ ┃ ┃ ┣ 📂FQI
┃ ┃ ┃ ┃ ┗ 📜...npy
┃ ┃ ┃ ┣ 📂LSPI
┃ ┃ ┃ ┃ ┗ 📜...npy
┃ ┃ ┃ ┣ 📂PBO_custom_linear
┃ ┃ ┃ ┃ ┗ 📜...npy
┃ ┃ ┃ ┣ 📂PBO_linear
┃ ┃ ┃ ┃ ┗ 📜...npy
┃ ┃ ┃ ┣ 📂PBO_optimal
┃ ┃ ┃ ┃ ┗ 📜...npy
┃ ┃ ┃ ┗ 📂optimal
┃ ┃ ┃ ┃ ┗ 📜...npy
┃ ┃ ┗ 📜...pdf
┃ ┣ 📜FQI.ipynb
┃ ┣ 📜LSPI.ipynb
┃ ┣ 📜PBO_custom_linear.ipynb
┃ ┣ 📜PBO_linear.ipynb
┃ ┣ 📜PBO_optimal.ipynb
┃ ┣ 📜optimal.ipynb
┃ ┣ 📜parameters.json
┃ ┣ 📜plots.ipynb
┃ ┗ 📜run_seeds.sh
┣ 📂test # tests for the environments and the networks
┗ 📂pbo # main code
```

## Using a GPU
In the folder where the code is, create a Python virtual environment, activate it and install the package and its dependencies in editable mode:
```bash
python3 -m venv env_gpu
source env_gpu/bin/activate
pip install -e .
```

If jax does not recognize the gpu, you may need to run
```bash
pip install -U jax[cuda11_cudnn82]==0.3.22 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
```
Taken from https://github.com/google/jax/discussions/10323.
(Taken from https://github.com/google/jax/discussions/10323)


## Using a cluster
Expand Down
4 changes: 3 additions & 1 deletion docker/cpu/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ FROM python:3.8.13-buster
RUN mkdir /workspace
WORKDIR /workspace

RUN pip install --upgrade pip

COPY . .

RUN pip install -e .[cpu]
RUN pip install -e .
3 changes: 2 additions & 1 deletion docker/gpu/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,6 @@ COPY . .
RUN apt-get -y update
RUN apt-get -y install python3
RUN apt-get -y install python3-pip
RUN pip install --upgrade pip

RUN pip install -e .[gpu] && pip install -U jax[cuda11_cudnn82] -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
RUN pip install -e . && pip install -U jax[cuda11_cudnn82] -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
13 changes: 13 additions & 0 deletions experiments/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
colors = {
"LSPI": "#984ea3",
"FQI": "#e41a1c",
"DQN": "#e41a1c",
"ProFQI": "#4daf4a",
"ProDQN": "#4daf4a",
"blue": "#377eb8",
"orange": "#ff7f00",
"pink": "#f781bf",
"brown": "#a65628",
"grey": "#999999",
"yellow": "#dede00",
}
48 changes: 48 additions & 0 deletions experiments/acrobot/DQN.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
import sys
import argparse
import json
import jax

from experiments.base.parser import addparse
from experiments.base.print import print_info


def run_cli(argvs=sys.argv[1:]):
import warnings

warnings.simplefilter(action="ignore", category=FutureWarning)

parser = argparse.ArgumentParser("Train DQN on Acrobot.")
addparse(parser, seed=True)
args = parser.parse_args(argvs)
print_info(args.experiment_name, "DQN", "Acrobot", args.max_bellman_iterations, args.seed)
p = json.load(open(f"experiments/acrobot/figures/{args.experiment_name}/parameters.json")) # p for parameters

from experiments.acrobot.utils import (
define_environment,
define_q,
collect_random_samples,
collect_samples,
generate_keys,
)
from pbo.sample_collection.replay_buffer import ReplayBuffer
from experiments.base.DQN import train

sample_key, exploration_key, q_key, _ = generate_keys(args.seed)

env = define_environment(jax.random.PRNGKey(p["env_seed"]), p["gamma"])
replay_buffer = ReplayBuffer(p["max_size"])
collect_random_samples(env, sample_key, replay_buffer, p["n_initial_samples"], p["horizon"])
q = define_q(
env.actions_on_max,
p["gamma"],
q_key,
p["layers_dimension"],
learning_rate={
"first": p["starting_lr_dqn"],
"last": p["ending_lr_dqn"],
"duration": args.max_bellman_iterations * p["fitting_steps_dqn"],
},
)

train("acrobot", args, q, p, exploration_key, sample_key, replay_buffer, collect_samples, env)
72 changes: 72 additions & 0 deletions experiments/acrobot/DQN_evaluate.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
import sys
import argparse
import multiprocessing
import json
import jax
import jax.numpy as jnp
import numpy as np

from experiments.base.parser import addparse
from experiments.base.print import print_info


def run_cli(argvs=sys.argv[1:]):
with jax.default_device(jax.devices("cpu")[0]):
import warnings

warnings.simplefilter(action="ignore", category=FutureWarning)

parser = argparse.ArgumentParser("Evaluate a DQN on Acrobot.")
addparse(parser, seed=True)
args = parser.parse_args(argvs)
print_info(args.experiment_name, "DQN", "Acrobot", args.max_bellman_iterations, args.seed, train=False)
p = json.load(open(f"experiments/acrobot/figures/{args.experiment_name}/parameters.json")) # p for parameters

from experiments.acrobot.utils import define_environment, define_q
from pbo.networks.learnable_q import FullyConnectedQ
from pbo.utils.params import load_params

env = define_environment(jax.random.PRNGKey(p["env_seed"]), p["gamma_evaluation"])

q = define_q(env.actions_on_max, p["gamma"], jax.random.PRNGKey(0), p["layers_dimension"])
iterated_params = load_params(
f"experiments/acrobot/figures/{args.experiment_name}/DQN/{args.max_bellman_iterations}_P_{args.seed}"
)

def evaluate(iteration: int, j_list: list, q: FullyConnectedQ, q_weights: jnp.ndarray, horizon: int):
j_list[iteration] = env.evaluate(
q,
q.to_params(q_weights),
horizon,
p["n_simulations"],
video_path=f"{args.experiment_name}/DQN/{iteration}_{args.seed}",
)

manager = multiprocessing.Manager()
iterated_j = manager.list(list(np.nan * np.zeros(args.max_bellman_iterations + 1)))

processes = []
for iteration in range(args.max_bellman_iterations + 1):
processes.append(
multiprocessing.Process(
target=evaluate,
args=(
iteration,
iterated_j,
q,
q.to_weights(iterated_params[f"{iteration}"]),
p["horizon_evaluation"],
),
)
)

for process in processes:
process.start()

for process in processes:
process.join()

np.save(
f"experiments/acrobot/figures/{args.experiment_name}/DQN/{args.max_bellman_iterations}_J_{args.seed}.npy",
iterated_j,
)
Loading

0 comments on commit 7f3bbe4

Please sign in to comment.