question: expected performance of vq-bet? #341

Jubayer-Hamid · 2024-07-25T04:35:06Z

Hi,

Thank you to the LeRobot community for maintaining such a fantastic codebase. My research group and I have greatly benefited from your efforts. In my current project, I am using the repository primarily for analyzing algorithms across different environments. I wanted to raise an issue I am encountering with VQ-BeT. I have been using the model on PushT and I want to ensure that the results I am obtaining align with community expectations. If not, I might be using the VQ-BeT repository incorrectly and would appreciate any guidance.

I used the following command: python lerobot/scripts/train.py vqbet pusht

For VQ-BeT, it seems like the maximum success rate is exactly 60%, whereas for Diffusion Policy the maximum success rate is 74%. Below, I have attached the wandb figures for the success rate vs training steps (left is for VQ-BeT and right is for Diffusion Policy):

Are these results expected for the algorithm? If not, am I running the wrong commands to reproduce the SOTA results?

Thank you for your assistance.

aliberts · 2024-07-25T08:14:42Z

Hi there, your results look a bit below — but not too far off either — what we achieved on our pre-trained policy here.

How many eval steps did you do?
Could you paste your command or a link to your wandb run to see your config?

@alexander-soare has probably more insights on this

alexander-soare · 2024-07-25T08:22:11Z

@Jubayer-Hamid thanks for raising this. VQ-BeT and Diffusion Policy should give about the same results. In fact, the models we have on the hub (DP, VQ-BeT) happen to both give 63.8% success rate with 500 evals.

If you try running evals with:
python lerobot/eval/scripts.py -p path/to/pretrained_model eval.n_episodes=500 eval.batch_size=50 eval.use_async_envs=true use_amp=true, what do you get?
The curves your are showing are likely only using 50 evaluation episodes if you used default settings, meaning the variance is quite high.

Jubayer-Hamid · 2024-07-26T21:33:52Z

Hi, thanks for the prompt response. After trying with 500 evaluate episodes, VQ-BeT's success rate managed to get much closer to Diffusion Policy's.

YuejiangLIU · 2024-08-04T06:10:06Z

Hi LeRobot authors,

Thank you for your fantastic repo!

I wanted to follow up regarding the expected results of VQ-BET. My collaborator and I ran your checkpoint and config across 500 episodes using the following command:

python lerobot/scripts/eval.py -p lerobot/vqbet_pusht eval.n_episodes=500 eval.batch_size=50

However, our results consistently came out lower than what’s reported on your HF page. Here are the results we obtained on two different GPU machines:

{'avg_sum_reward': 95.71936310205473, 'avg_max_reward': 0.8872214382670427, 'pc_success': 61.0}
{'avg_sum_reward': 99.5425249914288, 'avg_max_reward': 0.8906604772845598, 'pc_success': 61.0}

We're wondering if a recent code update might have impacted the evaluation. Could you please confirm the results for the released checkpoint?

Thanks,
Yuejiang

alexander-soare · 2024-08-12T09:05:06Z

@YuejiangLIU I just ran: python lerobot/scripts/eval.py -p lerobot/vqbet_pusht eval.n_episodes=500 eval.batch_size=50 eval.use_async_envs=true and got

{'avg_sum_reward': 97.27730768077599, 'avg_max_reward': 0.8951385362406257, 'pc_success': 63.800000000000004, 'eval_s': 89.14394330978394, 'eval_ep_s': 0.17828788709640503} (this is as reported at https://huggingface.co/lerobot/vqbet_pusht)

I even ran it with eval.use_async_envs=true (just to match your command exactly) and got the same result.

I'm on commit hash 2252b42.

I'm wondering if this is somehow related to system configuration and hardware. I'm using an Nvidia RTX 3090 on Ubuntu 22. @aliberts any other ideas? (for context, you just need to view @YuejiangLIU's last message and this one - they are falling short by a tiny amount on success rate)

aliberts assigned alexander-soare Jul 25, 2024

aliberts added 🤔 Question Further information is requested 🧠 Policies Something policies-related labels Jul 25, 2024

Jubayer-Hamid closed this as completed Jul 26, 2024

Jubayer-Hamid reopened this Aug 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question: expected performance of vq-bet? #341

question: expected performance of vq-bet? #341

Jubayer-Hamid commented Jul 25, 2024 •

edited

Loading

aliberts commented Jul 25, 2024

alexander-soare commented Jul 25, 2024

Jubayer-Hamid commented Jul 26, 2024

YuejiangLIU commented Aug 4, 2024

alexander-soare commented Aug 12, 2024 •

edited

Loading

question: expected performance of vq-bet? #341

question: expected performance of vq-bet? #341

Comments

Jubayer-Hamid commented Jul 25, 2024 • edited Loading

aliberts commented Jul 25, 2024

alexander-soare commented Jul 25, 2024

Jubayer-Hamid commented Jul 26, 2024

YuejiangLIU commented Aug 4, 2024

alexander-soare commented Aug 12, 2024 • edited Loading

Jubayer-Hamid commented Jul 25, 2024 •

edited

Loading

alexander-soare commented Aug 12, 2024 •

edited

Loading