Merge pull request #14 from theovincent/clean_exp

Clean exp
theovincent · Jan 26, 2023 · 7f3bbe4 · 7f3bbe4
2 parents c7fcfa2 + 99246ea
commit 7f3bbe4
Show file tree

Hide file tree

Showing 209 changed files with 8,595 additions and 13,993 deletions.
diff --git a/.dockerignore b/.dockerignore
@@ -15,9 +15,19 @@ Dockerfile
 **/*.pdf
 
 # Avoid pushing experiments output
+experiments/lqr/figures
+experiments/chain_walk/figures
 experiments/car_on_hill/figures
+experiments/bicycle_offline/figures
+experiments/bicycle_online/figures
+experiments/acrobot/figures
+experiments/lunar_lander/figures
 **/*.out
 
 # Save optimal value functions of car on hill
+!experiments/chain_walk/figures/data/optimal/Q.npy
+!experiments/chain_walk/figures/data/optimal/V.npy
+!experiments/lqr/figures/data/optimal/W.npy
+!experiments/lqr/figures/data/optimal/V.npy
 !experiments/car_on_hill/figures/data/optimal/Q.npy
 !experiments/car_on_hill/figures/data/optimal/V.npy
diff --git a/.gitignore b/.gitignore
@@ -145,12 +145,19 @@ dmypy.json
 **/*.pdf
 
 # Avoid pushing experiments output
-experiments/bicycle/figures
+experiments/lqr/figures
+experiments/chain_walk/figures
 experiments/car_on_hill/figures
+experiments/bicycle_offline/figures
+experiments/bicycle_online/figures
+experiments/acrobot/figures
 experiments/lunar_lander/figures
-experiments/lunar_lander/transfer_from_ias_cluster.sh
 **/*.out
 
-# Save optimal value functions of car on hill
+# Save optimal values
+!experiments/chain_walk/figures/data/optimal/Q.npy
+!experiments/chain_walk/figures/data/optimal/V.npy
+!experiments/lqr/figures/data/optimal/W.npy
+!experiments/lqr/figures/data/optimal/V.npy
 !experiments/car_on_hill/figures/data/optimal/Q.npy
 !experiments/car_on_hill/figures/data/optimal/V.npy
diff --git a/README.md b/README.md
@@ -2,188 +2,45 @@
 
 ## User installation
 ### Without Docker, with Python 3.8 or 3.9 installed
-In the folder where the code is, create a Python virtual environment, activate it and install the package and its dependencies in editable mode:
+In the folder where the code is, create a Python virtual environment, activate it, update pip and install the package and its dependencies in editable mode:
 ```bash
 python3 -m venv env
 source env/bin/activate
+pip install --upgrade pip
 pip install -e .
 ```
 
 ### With Docker
 Please see the [README](./docker/README.md) file made for that.
 
 ## Run the experiments
-For an `environment` and an `algorithm`, a jupyter notebook running the associated the experience can be found at _experiments/[environment]/[algorithm].ipynb_.
-
-For example, the jupyter notebook _experiments/chain_walk/PBO_linear.ipynb_ trains a linear PBO on the Chain-Walk environment.
-
-To generate the plots with $N$ seeds with $K$ Bellman iterations, you first need to generate the data by running `./experiments/[environment]/run_seeds.sh -n_seeds N -n_bellman_iteration K`
-and then run the jupyter notebook _experiments/[environment]/plots.ipynb_.
-
-### Replicate figures
-Figure 4a with one seed, run
-```Bash
-./experiments/chain_walk/run_seeds.sh --n_seeds 1 --n_bellman_iterations 5
-jupyter nbconvert --to notebook --inplace --execute experiments/chain_walk/plots.ipynb
-```
-You will find Figure 4a at _experiments/chain_walk/figures/distance_to_optimal_V_5.pdf_. The code should take around 5 minutes to run.
-
-Figure 4b with one seed, run
-```Bash
-./experiments/lqr/run_seeds.sh --n_seeds 1 --n_bellman_iterations 2
-jupyter nbconvert --to notebook --inplace --execute experiments/lqr/plots.ipynb
-```
-You will find Figure 4b at _experiments/lqr/figures/distance_to_optimal_Pi_2.pdf_. The code should take around 2 minutes to run.
-
-Figure 5a with one seed, run
-```Bash
-./experiments/car_on_hill/run_seeds.sh --n_seeds 1 --n_bellman_iterations 9
-jupyter nbconvert --to notebook --inplace --execute experiments/car_on_hill/samples.ipynb
-jupyter nbconvert --to notebook --inplace --execute experiments/car_on_hill/plots.ipynb
-```
-You will find Figure 5a at _experiments/car_on_hill/figures/distance_to_optimal_V_9.pdf_. The code should take around 30 minutes to run.
+All the experiments can be ran the same way by simply replacing the name of the environment, here is an example for LQR.
 
-Figure 5b with one seed, run
+The following command line runs the training and the evaluation of all the algorithms, one after the other:
 ```Bash
-./experiments/bicycle/run_seeds_FQI.sh --n_seeds 1 --n_bellman_iterations 8
-./experiments/bicycle/run_seeds_PBO_deep.sh --n_seeds 1 --n_bellman_iterations 8
-./experiments/bicycle/run_seeds_PBO_linear.sh --n_seeds 1 --n_bellman_iterations 8
-jupyter nbconvert --to notebook --inplace --execute experiments/bicycle/plots.ipynb
+launch_job/lqr/launch_local.sh --experiment_name test --max_bellman_iterations 3 --first_seed 1 --last_seed 1
 ```
-You will find Figure 5b at _experiments/bicycle/figures/seconds_8.pdf_. The code should take around 3 hours to run.
-
-If any problem is encountered, make sure your files match the [file organization](#file-organization) and that the parameters _experiments/[environment]/plots.ipynb_ are matching the data that has been computed so far.
+The expected time to finish the runs is 1 minute.
 
-## Run Car On Hill
-```Bash
-car_on_hill_sample  # to collect the offline dataset 
-car_on_hill_fqi -b 9 -s 1  # to train and evaluate FQI
-car_on_hill_pbo -a linear -b 9 -s 1  # to train a linear PBO
-car_on_hill_pbo_evaluate -a linear -b 9 -s 1  # to evaluate it
-car_on_hill_pbo -a deep -b 9 -s 1  # to train a deep PBO
-car_on_hill_pbo_evaluate -a deep -b 9 -s 1  # to evaluate it
-```
+Once all the trainings are done, you can generate the figures shown in the paper by running the jupyter notebook file located at *experiments/lqr/plots.ipynb*. In the first cell of the notebook, please make sure to change the *experiment_name*, the *max_bellman_iterations* and the *seeds* according to the training that you have ran. You can also have a look at the loss of the training thought the jupyter notebook under *experiments/lqr/plots_loss.ipynb*.
 
 ## Run the tests
 Run all tests with
 ```Bash
 pytest
 ```
-The code should take around 1 minutes to run.
+The code should take around 1 minute to run.
 
-## File organization
-```
-📦PBO
- ┣ 📂env  # environment files
- ┣ 📂experiments  # files to run the experiments
- ┃ ┣ 📂bicycle
- ┃ ┃ ┣ 📂figures
- ┃ ┃ ┃ ┣ 📂data
- ┃ ┃ ┃ ┃ ┣ 📂FQI
- ┃ ┃ ┃ ┃ ┃ ┗ 📜...npy  # data generated by running the experiments
- ┃ ┃ ┃ ┃ ┣ 📂PBO_linear
- ┃ ┃ ┃ ┃ ┃ ┗ 📜...npy
- ┃ ┃ ┃ ┃ ┣ 📂PBO_linear_max_linear
- ┃ ┃ ┃ ┃ ┃ ┗ 📜...npy
- ┃ ┃ ┃ ┃ ┣ 📂PBO_optimal
- ┃ ┃ ┃ ┃ ┃ ┗ 📜...npy
- ┃ ┃ ┃ ┃ ┗ 📂optimal
- ┃ ┃ ┃ ┃   ┗ 📜...npy
- ┃ ┃ ┃ ┗ 📜...pdf  # plots of the experiments generated from plots.ipynb
- ┃ ┃ ┣ 📜FQI.ipynb
- ┃ ┃ ┣ 📜PBO_linear_max_linear.ipynb
- ┃ ┃ ┣ 📜PBO_linear.ipynb
- ┃ ┃ ┣ 📜parameters.json
- ┃ ┃ ┣ 📜plots.ipynb
- ┃ ┃ ┗ 📜run_seeds.sh
- ┃ ┣ 📂car_on_hill
- ┃ ┃ ┣ 📂figures
- ┃ ┃ ┃ ┣ 📂data
- ┃ ┃ ┃ ┃ ┣ 📂FQI
- ┃ ┃ ┃ ┃ ┃ ┗ 📜...npy
- ┃ ┃ ┃ ┃ ┣ 📂PBO_linear
- ┃ ┃ ┃ ┃ ┃ ┗ 📜...npy
- ┃ ┃ ┃ ┃ ┣ 📂PBO_linear_max_linear
- ┃ ┃ ┃ ┃ ┃ ┗ 📜...npy
- ┃ ┃ ┃ ┃ ┣ 📂PBO_optimal
- ┃ ┃ ┃ ┃ ┃ ┗ 📜...npy
- ┃ ┃ ┃ ┃ ┗ 📂optimal
- ┃ ┃ ┃ ┃   ┗ 📜...npy
- ┃ ┃ ┃ ┗ 📜...pdf
- ┃ ┃ ┣ 📜FQI.ipynb
- ┃ ┃ ┣ 📜PBO_linear_max_linear.ipynb
- ┃ ┃ ┣ 📜PBO_linear.ipynb
- ┃ ┃ ┣ 📜optimal.py
- ┃ ┃ ┣ 📜parameters.json
- ┃ ┃ ┣ 📜plots.ipynb
- ┃ ┃ ┗ 📜run_seeds.sh
- ┃ ┣ 📂chain_walk
- ┃ ┃ ┣ 📂figures
- ┃ ┃ ┃ ┣ 📂data
- ┃ ┃ ┃ ┃ ┣ 📂FQI
- ┃ ┃ ┃ ┃ ┃ ┗ 📜...npy
- ┃ ┃ ┃ ┃ ┣ 📂LSPI
- ┃ ┃ ┃ ┃ ┃ ┗ 📜...npy
- ┃ ┃ ┃ ┃ ┣ 📂PBO_linear
- ┃ ┃ ┃ ┃ ┃ ┗ 📜...npy
- ┃ ┃ ┃ ┃ ┣ 📂PBO_max_linear
- ┃ ┃ ┃ ┃ ┃ ┗ 📜...npy
- ┃ ┃ ┃ ┃ ┣ 📂PBO_optimal
- ┃ ┃ ┃ ┃ ┃ ┗ 📜...npy
- ┃ ┃ ┃ ┃ ┗ 📂optimal
- ┃ ┃ ┃ ┃ ┃ ┗ 📜...npy
- ┃ ┃ ┃ ┗ 📜...pdf
- ┃ ┃ ┣ 📜FQI.ipynb
- ┃ ┃ ┣ 📜LSPI.ipynb
- ┃ ┃ ┣ 📜PBO_linear.ipynb
- ┃ ┃ ┣ 📜PBO_max_linear.ipynb
- ┃ ┃ ┣ 📜PBO_optimal.ipynb
- ┃ ┃ ┣ 📜optimal.ipynb
- ┃ ┃ ┣ 📜parameters.json
- ┃ ┃ ┣ 📜plots.ipynb
- ┃ ┃ ┗ 📜run_seeds.sh
- ┃ ┗ 📂lqr
- ┃   ┣ 📂figures
- ┃   ┃ ┣ 📂data
- ┃   ┃ ┃ ┣ 📂FQI
- ┃   ┃ ┃ ┃ ┗ 📜...npy
- ┃   ┃ ┃ ┣ 📂LSPI
- ┃   ┃ ┃ ┃ ┗ 📜...npy
- ┃   ┃ ┃ ┣ 📂PBO_custom_linear
- ┃   ┃ ┃ ┃ ┗ 📜...npy
- ┃   ┃ ┃ ┣ 📂PBO_linear
- ┃   ┃ ┃ ┃ ┗ 📜...npy
- ┃   ┃ ┃ ┣ 📂PBO_optimal
- ┃   ┃ ┃ ┃ ┗ 📜...npy
- ┃   ┃ ┃ ┗ 📂optimal
- ┃   ┃ ┃ ┃ ┗ 📜...npy
- ┃   ┃ ┗ 📜...pdf
- ┃   ┣ 📜FQI.ipynb
- ┃   ┣ 📜LSPI.ipynb
- ┃   ┣ 📜PBO_custom_linear.ipynb
- ┃   ┣ 📜PBO_linear.ipynb
- ┃   ┣ 📜PBO_optimal.ipynb
- ┃   ┣ 📜optimal.ipynb
- ┃   ┣ 📜parameters.json
- ┃   ┣ 📜plots.ipynb
- ┃   ┗ 📜run_seeds.sh
- ┣ 📂test  # tests for the environments and the networks
- ┗ 📂pbo  # main code
-```
 
 ## Using a GPU
 In the folder where the code is, create a Python virtual environment, activate it and install the package and its dependencies in editable mode:
 ```bash
 python3 -m venv env_gpu
 source env_gpu/bin/activate
 pip install -e .
-```
-
-If jax does not recognize the gpu, you may need to run
-```bash
 pip install -U jax[cuda11_cudnn82]==0.3.22 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
 ```
-Taken from https://github.com/google/jax/discussions/10323.
+(Taken from https://github.com/google/jax/discussions/10323)
 
 
 ## Using a cluster

diff --git a/docker/cpu/Dockerfile b/docker/cpu/Dockerfile
@@ -3,6 +3,8 @@ FROM python:3.8.13-buster
 RUN mkdir /workspace
 WORKDIR /workspace
 
+RUN pip install --upgrade pip
+
 COPY . .
 
-RUN pip install -e .[cpu]
+RUN pip install -e .
diff --git a/docker/gpu/Dockerfile b/docker/gpu/Dockerfile
@@ -8,5 +8,6 @@ COPY . .
 RUN apt-get -y update
 RUN apt-get -y install python3
 RUN apt-get -y install python3-pip
+RUN pip install --upgrade pip
 
-RUN pip install -e .[gpu] && pip install -U jax[cuda11_cudnn82] -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
+RUN pip install -e . && pip install -U jax[cuda11_cudnn82] -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
diff --git a/experiments/__init__.py b/experiments/__init__.py
@@ -0,0 +1,13 @@
+colors = {
+    "LSPI": "#984ea3",
+    "FQI": "#e41a1c",
+    "DQN": "#e41a1c",
+    "ProFQI": "#4daf4a",
+    "ProDQN": "#4daf4a",
+    "blue": "#377eb8",
+    "orange": "#ff7f00",
+    "pink": "#f781bf",
+    "brown": "#a65628",
+    "grey": "#999999",
+    "yellow": "#dede00",
+}
diff --git a/experiments/acrobot/DQN.py b/experiments/acrobot/DQN.py
@@ -0,0 +1,48 @@
+import sys
+import argparse
+import json
+import jax
+
+from experiments.base.parser import addparse
+from experiments.base.print import print_info
+
+
+def run_cli(argvs=sys.argv[1:]):
+    import warnings
+
+    warnings.simplefilter(action="ignore", category=FutureWarning)
+
+    parser = argparse.ArgumentParser("Train DQN on Acrobot.")
+    addparse(parser, seed=True)
+    args = parser.parse_args(argvs)
+    print_info(args.experiment_name, "DQN", "Acrobot", args.max_bellman_iterations, args.seed)
+    p = json.load(open(f"experiments/acrobot/figures/{args.experiment_name}/parameters.json"))  # p for parameters
+
+    from experiments.acrobot.utils import (
+        define_environment,
+        define_q,
+        collect_random_samples,
+        collect_samples,
+        generate_keys,
+    )
+    from pbo.sample_collection.replay_buffer import ReplayBuffer
+    from experiments.base.DQN import train
+
+    sample_key, exploration_key, q_key, _ = generate_keys(args.seed)
+
+    env = define_environment(jax.random.PRNGKey(p["env_seed"]), p["gamma"])
+    replay_buffer = ReplayBuffer(p["max_size"])
+    collect_random_samples(env, sample_key, replay_buffer, p["n_initial_samples"], p["horizon"])
+    q = define_q(
+        env.actions_on_max,
+        p["gamma"],
+        q_key,
+        p["layers_dimension"],
+        learning_rate={
+            "first": p["starting_lr_dqn"],
+            "last": p["ending_lr_dqn"],
+            "duration": args.max_bellman_iterations * p["fitting_steps_dqn"],
+        },
+    )
+
+    train("acrobot", args, q, p, exploration_key, sample_key, replay_buffer, collect_samples, env)
diff --git a/experiments/acrobot/DQN_evaluate.py b/experiments/acrobot/DQN_evaluate.py
@@ -0,0 +1,72 @@
+import sys
+import argparse
+import multiprocessing
+import json
+import jax
+import jax.numpy as jnp
+import numpy as np
+
+from experiments.base.parser import addparse
+from experiments.base.print import print_info
+
+
+def run_cli(argvs=sys.argv[1:]):
+    with jax.default_device(jax.devices("cpu")[0]):
+        import warnings
+
+        warnings.simplefilter(action="ignore", category=FutureWarning)
+
+        parser = argparse.ArgumentParser("Evaluate a DQN on Acrobot.")
+        addparse(parser, seed=True)
+        args = parser.parse_args(argvs)
+        print_info(args.experiment_name, "DQN", "Acrobot", args.max_bellman_iterations, args.seed, train=False)
+        p = json.load(open(f"experiments/acrobot/figures/{args.experiment_name}/parameters.json"))  # p for parameters
+
+        from experiments.acrobot.utils import define_environment, define_q
+        from pbo.networks.learnable_q import FullyConnectedQ
+        from pbo.utils.params import load_params
+
+        env = define_environment(jax.random.PRNGKey(p["env_seed"]), p["gamma_evaluation"])
+
+        q = define_q(env.actions_on_max, p["gamma"], jax.random.PRNGKey(0), p["layers_dimension"])
+        iterated_params = load_params(
+            f"experiments/acrobot/figures/{args.experiment_name}/DQN/{args.max_bellman_iterations}_P_{args.seed}"
+        )
+
+        def evaluate(iteration: int, j_list: list, q: FullyConnectedQ, q_weights: jnp.ndarray, horizon: int):
+            j_list[iteration] = env.evaluate(
+                q,
+                q.to_params(q_weights),
+                horizon,
+                p["n_simulations"],
+                video_path=f"{args.experiment_name}/DQN/{iteration}_{args.seed}",
+            )
+
+        manager = multiprocessing.Manager()
+        iterated_j = manager.list(list(np.nan * np.zeros(args.max_bellman_iterations + 1)))
+
+        processes = []
+        for iteration in range(args.max_bellman_iterations + 1):
+            processes.append(
+                multiprocessing.Process(
+                    target=evaluate,
+                    args=(
+                        iteration,
+                        iterated_j,
+                        q,
+                        q.to_weights(iterated_params[f"{iteration}"]),
+                        p["horizon_evaluation"],
+                    ),
+                )
+            )
+
+        for process in processes:
+            process.start()
+
+        for process in processes:
+            process.join()
+
+        np.save(
+            f"experiments/acrobot/figures/{args.experiment_name}/DQN/{args.max_bellman_iterations}_J_{args.seed}.npy",
+            iterated_j,
+        )