Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Early Stopping, Learning rate and noise decay #398

Open
kim-mskw opened this issue Aug 5, 2024 · 3 comments
Open

Early Stopping, Learning rate and noise decay #398

kim-mskw opened this issue Aug 5, 2024 · 3 comments
Assignees
Labels
enhancement An optional feature or enhancement

Comments

@kim-mskw
Copy link
Contributor

kim-mskw commented Aug 5, 2024

Implement the best practices from multi-agent Rl community and stablebaselines3 into our algorithm. Further analyse similarities between petting zoo multi-agent implementation to current RL implementation of Assume. (https://towardsdatascience.com/multi-agent-deep-reinforcement-learning-in-15-lines-of-code-using-pettingzoo-e0b963c0820b)

@kim-mskw kim-mskw added the enhancement An optional feature or enhancement label Aug 5, 2024
@mthede
Copy link
Collaborator

mthede commented Sep 24, 2024

Quick Review in Code Bases and Literature

Early Stopping

1. Stable-Baselines3

Three possible callbacks:

2. Standard Protocol for Cooperative MARL

Meta-analysis on evaluation methodologies of cooperative MARL with proposed recommendations for standardised performance evaluation protocol. Published at NeurIPS 2022. Summary of protocol here.

--> Fixed number of training timesteps and episodes.

3. BenchMARL (Meta Research)

A proposed framework to deal with the fragmented community standards and reproducibility issues highlighted by the analysis above. Also some competitive environments. Published in Journal of Machine Learning Research 2024.

BenchMARL is a Multi-Agent Reinforcement Learning (MARL) training library created to enable reproducibility and benchmarking across different MARL algorithms and environments. Its mission is to present a standardized interface that allows easy integration of new algorithms and environments to provide a fair comparison with existing solutions.

--> Not implemented, callbacks may be customized.

4. PettingZoo

--> Not implemented.

Summary

For comparability and according to community standards no early stopping (as default)
early_stopping_steps = training_episodes / validation_episodes_interval + 1
early_stopping_threshold = 0

Our current implementation of early stopping is closest to the "no model improvement" callback from SB3. However, suitable default values for steps and threshold are unclear. Should be chosen rather conservatively due to instability of environment and training, but can be useful for experimentation or when dealing with time/computational restrictions. So we'd like to keep the options.

Future development: could be restructured and generalized as callbacks.

@mthede
Copy link
Collaborator

mthede commented Sep 24, 2024

Action Noise (Decay)

1. Action Noise in Off-Policy Deep Reinforcement Learning: Impact on Exploration and Performance

Published by Jakob Hollenstein et al. (2022)

grafik
▶️ Recommendation (among others regarding noise type, noise scale and impact factor):

In general ▷ use a scheduler [...] Finally we recommend a scheduled reduction of the action noise impact factor β of over the training progress to improve robustness to the action noise configuration.

But type of scheduler didn't seem to be relevent. Performance of linear and logistic scheduler were similar.

2.1 Parameter Space Noise for Exploration

Published by Matthias Plappert et al. (2018)

  • “add noise directly to the agent’s parameters” instead of added action noise
  • “adapting the scale of the parameter space noise over time” with simple heuristic for time-varying scale

2.2 OpenAI Baselines

Implementation of Parameter Space Noise for DDPG. Image from OpenAI blog article.
grafik
▶️ Quite some overhead and OpenAI Baselines repo is no longer developed.

3. NoisyNet: Noisy Networks for Exploration

Published by Fortunato et al. (2019), also followed by a US Patent (2019, 2024) from DeepMind.

  • stochastic network layers for exploration

▶️ Would need further manual implementation. Seems to be used in other (more recent) publications.

4. PettingZoo

Action noise decay in tutorials with MARL libraries.

4.1 AgileRL

  • e.g. MATD3: “manually implemented” exponential decay in tutorial, no scheduler offered by AgileRL
    grafik
    grafik

4.2.1 Ray (PettingZoo x DQN):

  • EpsilonGreedy with annealing
    grafik

4.2.2 Ray (in total 17 possible choices of exploration with 5 different general schedulers):

Summary

NoisyNet or Parameter Space Noise interesting, but unclear if implementation effort would be justified. Generally, scheduling of decaying action noise should be implemented to improve performance.

▶️ Simple (e.g linear) but effective scheduling of action noise scale for now preferred.

If early stopping is enabled: due to scheduling according to fixed number of timesteps/episodes, a warning message needs to be generated that results may be improved because noise decay was not fully performed.

@mthede
Copy link
Collaborator

mthede commented Sep 26, 2024

Learning Rate Decay

Sidenote: Currently Adam optimizer is used - Adam adapts/decays the individual learning rates of parameters automatically. However, an additional scheduling of the learning rate may still improve performance as discussed here.

1.1 Stable-Baselines3

  • General schedule function that can be used for learning rate decay
  • Default: constant
  • Implementations: TD3 --> constant; DQN --> linear decay

1.2. SB3 Zoo (Training Framework)

  • Linear and constant schedule used
  • Hyperparameter tuning for choice of scheduler, e.g. for PPO

2. Ray RL Library Scheduler

  • General scheduling capabilities
  • See comment on action noise decay

3. PyTorch LR Scheduler

  • 15 different schedulers available
  • "Learning rate scheduling should be applied after optimizer’s update"

4. Learning to Learn Learning-Rate Schedules

Series of conference papers on learning learning rates. Latest publication on a GreedyLR scheduler that ...

... outperforms several state-of-the-art schedulers in terms of accuracy, speed, and convergence. Furthermore, our method is easy to implement, computationally efficient, and requires minimal hyperparameter tuning.

This seems promising but unfortunately no code is provided. It is based on PyTorch's ReduceLROnPlateau scheduler.

Summary

Learning rate scheduling can be beneficial - also for Adam optimizer - and reduce runtimes. Current developments show great potential, but implementation would need to derived from publication. PyTorch offers many schedulers out of the box, but only specifically for the learning rate.

The SB3 implemetation provides a general scheduler which can be used for learning rate and action noise decay.
▶️ As we'd like to implement it anyway for the action noise decay, it can be used for the learning rate as well. It can be discussed in the future, if it should be switched to PyTorch's internal LR Scheduler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An optional feature or enhancement
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants