Skip to content
/ A4C Public

Tensorflow implementation for 'A4C: Anticipatory Asynchronous Actor Critic'

Notifications You must be signed in to change notification settings

Tharun24/A4C

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A4C

This repository contains the code for A4C introduced in the paper

A4C:Anticipatory Asynchronous Actor Critic

Tharun Medini, Xun Luan, Anshumali Shrivastava

Citation

If you find the idea useful, please cite

@article{ anonymous2018anticipatory, title={Anticipatory Asynchronous Advantage Actor-Critic (A4C): The power of Anticipation in Deep Reinforcement Learning}, author={Anonymous}, journal={International Conference on Learning Representations}, year={2018}, url={https://openreview.net/forum?id=rkKkSzb0b} }

Usage

The repository contains two folders, one for the baseline GA3C and the other for Switching version of A4C. The other 2 variants Dependent Updating(DU) and Indpenednt Updating(IU) can be obtained by tweaking the switching_time in Config.py in the folder Switching. If switching time is set very high(beyond 20 hrs = 72000 seconds), it performs DU, and if we set the switching_time to 0, it performs IU. The code structure is essentially similar to GA3C with critical differences in Config.py, ProcessAgent.py, GameManager.py and Environment.py . We use the same Network Architecture(16 conv layers with size 8*8 - 32 cond layers with size 4*4 - FC layer with 256 units) and Optimizer as in GA3C. To run the A4C code, please change your directory to Switching and run:

sh _clean.sh
sh _train.sh

Configuration

The folder Switching has the file Config.py with the following chunk of code:

meta_size = 2 # aka step_size
begin_time = time.time()
switching_time = 9000

Here, meta_size is the max length of multi-step actions. By default, we use upto 2-step actions.

switching_time is the time after which we switch(in seconds). This should roughly be the time when Dependent Updating starts to decay(9000 for Pong, Qbert and SpaceInvaders; 18000 for BeamRider). Automatig this is a little difficult because the rewards are extremely variant even when we perform a moving average over the last 1000 episodes. It's hard to judge whether the rewards have saturated or not. We are working on tracking the moving median of rewards of 1000 episodes and switching when it reduces.

Plotting Results

The codes in both GA3C and Switching folders give out text file each with total rewards for each episode. Running the script plot.py directly plots the comparison of both approaches. We can plot both wrt time and episodes by changing the variable plotby in the following chunk:

plotby = 'time'

if plotby=='episodes':
    plt1, = plt.plot(r_mean1,'r',label='GA3C')
    plt2, = plt.plot(r_mean2,'g',label='A4C')
    .....

About

Tensorflow implementation for 'A4C: Anticipatory Asynchronous Actor Critic'

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published