Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question: expected performance of vq-bet? #341

Open
Jubayer-Hamid opened this issue Jul 25, 2024 · 5 comments
Open

question: expected performance of vq-bet? #341

Jubayer-Hamid opened this issue Jul 25, 2024 · 5 comments
Assignees
Labels
🧠 Policies Something policies-related 🤔 Question Further information is requested

Comments

@Jubayer-Hamid
Copy link

Jubayer-Hamid commented Jul 25, 2024

Hi,

Thank you to the LeRobot community for maintaining such a fantastic codebase. My research group and I have greatly benefited from your efforts. In my current project, I am using the repository primarily for analyzing algorithms across different environments. I wanted to raise an issue I am encountering with VQ-BeT. I have been using the model on PushT and I want to ensure that the results I am obtaining align with community expectations. If not, I might be using the VQ-BeT repository incorrectly and would appreciate any guidance.

I used the following command: python lerobot/scripts/train.py vqbet pusht

For VQ-BeT, it seems like the maximum success rate is exactly 60%, whereas for Diffusion Policy the maximum success rate is 74%. Below, I have attached the wandb figures for the success rate vs training steps (left is for VQ-BeT and right is for Diffusion Policy):

Screenshot 2024-07-24 at 9 33 00 PM Screenshot 2024-07-24 at 9 33 14 PM

Are these results expected for the algorithm? If not, am I running the wrong commands to reproduce the SOTA results?

Thank you for your assistance.

@aliberts
Copy link
Collaborator

Hi there, your results look a bit below — but not too far off either — what we achieved on our pre-trained policy here.

How many eval steps did you do?
Could you paste your command or a link to your wandb run to see your config?

@alexander-soare has probably more insights on this

@aliberts aliberts added 🤔 Question Further information is requested 🧠 Policies Something policies-related labels Jul 25, 2024
@alexander-soare
Copy link
Collaborator

@Jubayer-Hamid thanks for raising this. VQ-BeT and Diffusion Policy should give about the same results. In fact, the models we have on the hub (DP, VQ-BeT) happen to both give 63.8% success rate with 500 evals.

If you try running evals with:
python lerobot/eval/scripts.py -p path/to/pretrained_model eval.n_episodes=500 eval.batch_size=50 eval.use_async_envs=true use_amp=true, what do you get?
The curves your are showing are likely only using 50 evaluation episodes if you used default settings, meaning the variance is quite high.

@Jubayer-Hamid
Copy link
Author

Hi, thanks for the prompt response. After trying with 500 evaluate episodes, VQ-BeT's success rate managed to get much closer to Diffusion Policy's.

@YuejiangLIU
Copy link

Hi LeRobot authors,

Thank you for your fantastic repo!

I wanted to follow up regarding the expected results of VQ-BET. My collaborator and I ran your checkpoint and config across 500 episodes using the following command:

python lerobot/scripts/eval.py -p lerobot/vqbet_pusht eval.n_episodes=500 eval.batch_size=50

However, our results consistently came out lower than what’s reported on your HF page. Here are the results we obtained on two different GPU machines:

{'avg_sum_reward': 95.71936310205473, 'avg_max_reward': 0.8872214382670427, 'pc_success': 61.0}
{'avg_sum_reward': 99.5425249914288, 'avg_max_reward': 0.8906604772845598, 'pc_success': 61.0}

We're wondering if a recent code update might have impacted the evaluation. Could you please confirm the results for the released checkpoint?

Thanks,
Yuejiang

@Jubayer-Hamid Jubayer-Hamid reopened this Aug 4, 2024
@alexander-soare
Copy link
Collaborator

alexander-soare commented Aug 12, 2024

@YuejiangLIU I just ran: python lerobot/scripts/eval.py -p lerobot/vqbet_pusht eval.n_episodes=500 eval.batch_size=50 eval.use_async_envs=true and got

{'avg_sum_reward': 97.27730768077599, 'avg_max_reward': 0.8951385362406257, 'pc_success': 63.800000000000004, 'eval_s': 89.14394330978394, 'eval_ep_s': 0.17828788709640503} (this is as reported at https://huggingface.co/lerobot/vqbet_pusht)

I even ran it with eval.use_async_envs=true (just to match your command exactly) and got the same result.

I'm on commit hash 2252b42.

I'm wondering if this is somehow related to system configuration and hardware. I'm using an Nvidia RTX 3090 on Ubuntu 22. @aliberts any other ideas? (for context, you just need to view @YuejiangLIU's last message and this one - they are falling short by a tiny amount on success rate)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🧠 Policies Something policies-related 🤔 Question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants