First Pass of MAIRL #8

lalitlal · 2021-03-28T01:07:48Z

Major In progress

Create Reward Fn and Value Fn in Discriminator [DONE]

Inside init() or forward() -> added inside forward()

[discriminator_irl.py] - modified discriminator.py [DONE]

Calculate log_p_tau, log_q_tau, log_pq for external losses

[mgail.py] - modify how our discriminator is changed [DONE]

See Fixed std for policy or learned std? #2. Discriminator section

[mgail.py] - modify al_loss to take into account new [DONE] discriminator output
Modify expert data to include expert action probs [TODO]

Need this for lprobs

Modify ER to account for new field 'action_probs' [TODO]

TO DO

Testing that we didn't break existing code
Testing the MAIRL actually compiles
Testing that MAIRL actually works as expected

…river.py

Garage scripts to train and run expert policy

…stead

Clean up

…nged)

first pass of using generated data

- Introduce WithActionObs and OnlyPartialObjAndColor wrappers - Change max_kl_step to 0.001 - Change the hidden sizes of the policy

Improve TRPO training

merged in plotting data for hopper

first pass of modifying ER to get augmented state observation

Yip Sang Leung and others added 30 commits February 26, 2021 17:02

Add .gitignore

efe6e69

Introduce minigrid4rooms expert data

806511f

Introduce utils get_d4rl_dataset and load_d4rl_er

5f0842d

Try to remove all qpos and qvel

86db7dc

Use minigrid4rooms expert data

207aacc

Use MiniGrid env

5c39163

Add requirements.txt

8682f33

cleaned up files to use d4rl data and run main script

8ac7378

hardcoded action_space in environment.py and flatten observation in d…

1d33dd7

…river.py

Update .gitignore

26f691a

Introduce script to train expert policy

2cc90c8

Introduce script to run expert policy

d14502d

fixed bug in driver sampling expert replay buffer

8365429

Merge pull request #1 from yipsang/garage_scripts

b37f7ea

Garage scripts to train and run expert policy

changed tensor v1 imports to just use tf v1 for specific functions in…

a099695

…stead

Merge pull request #2 from yipsang/clean_up

96f34f7

Clean up

changed driver.py to use first state from expert samples

7dfc481

first pass of using generated data

953d6fa

updating generation to have more diverse sampling

b00b332

added code to plot one of the losses, doesn't seem to save anywhere yet

54d4c5a

bug fix/ comments

d819fe4

plotting avg losses over training, added graphs for 10K training iters

a748e85

pushing new generated data with slight mod to generation script

1a20934

updated env and driver for new data (action space and state sizes cha…

fcb9e89

…nged)

Merge pull request #3 from yipsang/data_gen

997233b

first pass of using generated data

first pass of modifying ER to get augmented state observation

5e4e49e

Improve TRPO training

2f56a48

- Introduce WithActionObs and OnlyPartialObjAndColor wrappers - Change max_kl_step to 0.001 - Change the hidden sizes of the policy

hyperparam change for mail

0579097

removing whitespace

09794c9

plumbing to use pybulletgym

8867af1

Lalit Lal and others added 13 commits March 17, 2021 00:24

adding generated data

a661e2e

Merge pull request #4 from yipsang/improve-trpo

040f9c1

Improve TRPO training

key changes to getting pybulletgym working

3865aed

Merge branch 'plot_data' into augmented_er

e58166f

merged in plotting data for hopper

adding humanoid bullet expert data

dbb34a1

adding antbullet

d6b64e9

adding hopperbullet

eec6e82

Merge remote-tracking branch 'origin/hopper_plots' into augmented_er

0b806e2

adding proper requirements

acd9eb4

restoring test noise condition

76499a1

Merge branch 'master' into augmented_er

18363c2

Merge pull request #5 from yipsang/augmented_er

eca1f38

first pass of modifying ER to get augmented state observation

first pass of MAIRL

842c200

lalitlal closed this Mar 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First Pass of MAIRL #8

First Pass of MAIRL #8

lalitlal commented Mar 28, 2021

First Pass of MAIRL #8

First Pass of MAIRL #8

Conversation

lalitlal commented Mar 28, 2021