Same reward thought the training in DDPG #1233

Siddhu2502 · 2024-05-20T17:05:20Z

agent = DRLAgent(env = env_train)
DDPG_PARAMS = {
    "batch_size": 4096,
    "buffer_size": 1000000,
    "learning_rate": 0.0003,
    "learning_starts": 100,
    "tau":0.02,
}

model_ddpg = agent.get_model("ddpg",model_kwargs = DDPG_PARAMS)

#training DDPG Agent
trained_ddpg = agent.train_model(model=model_ddpg,
                             tb_log_name='ddpg',
                             total_timesteps=50000)

----------------------------------
| time/              |           |
|    episodes        | 4         |
|    fps             | 29        |
|    time_elapsed    | 189       |
|    total_timesteps | 5608      |
| train/             |           |
|    actor_loss      | -11.6     |
|    critic_loss     | 0.0618    |
|    learning_rate   | 0.0003    |
|    n_updates       | 5507      |
|    reward          | 0.5398047 |
----------------------------------
day: 1401, episode: 10
begin_total_asset: 100000.00
end_total_asset: 259256.35
total_reward: 159256.35
total_cost: 138.56
total_trades: 72857
Sharpe: 0.778
=================================
----------------------------------
| time/              |           |
|    episodes        | 8         |
|    fps             | 29        |
|    time_elapsed    | 386       |
|    total_timesteps | 11216     |
| train/             |           |
|    actor_loss      | -3.94     |
|    critic_loss     | 0.0111    |
|    learning_rate   | 0.0003    |
|    n_updates       | 11115     |
|    reward          | 0.5398047 |
----------------------------------
----------------------------------
| time/              |           |
|    episodes        | 12        |
|    fps             | 28        |
|    time_elapsed    | 584       |
|    total_timesteps | 16824     |
| train/             |           |
|    actor_loss      | -1.22     |
|    critic_loss     | 0.0419    |
|    learning_rate   | 0.0003    |
|    n_updates       | 16723     |
|    reward          | 0.5398047 |
----------------------------------

I am trying to use DDPG for my StockTradingEnv provided by FINRL. The rewards is same for all over the episodes and also when plotting out the buys sells and holds of the stocks

df_account_value, df_actions = DRLAgent.DRL_prediction(
    model=trained_ddpg,
    environment = e_trade_gym)

The entire table is just 0s starting form the first row onwards the performance is way wayy worse than SAC and training the DDPG for 1000 time steps is giving same result as of training 10k time steps

Am i missing something is it with the hyper parameters ?

@ndronen @lcavalie @dubodog @kruzel

Siddhu2502 changed the title ~~Same reward through the training in DDPG~~ Same reward thought the training in DDPG May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Same reward thought the training in DDPG #1233

Same reward thought the training in DDPG #1233

Siddhu2502 commented May 20, 2024 •

edited

Loading

Same reward thought the training in DDPG #1233

Same reward thought the training in DDPG #1233

Comments

Siddhu2502 commented May 20, 2024 • edited Loading

Siddhu2502 commented May 20, 2024 •

edited

Loading