Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing published score #41

Closed
htdt opened this issue Sep 8, 2019 · 6 comments
Closed

Reproducing published score #41

htdt opened this issue Sep 8, 2019 · 6 comments

Comments

@htdt
Copy link

htdt commented Sep 8, 2019

Thank you very much for sharing this implementation!
There is a minor issue, as far as i understand, here should be steps=args.pretraining_steps.

I tried to run the experiment with python -m scripts.run_probe --method infonce-stdim --env-name MsPacmanNoFrameskip-v4 --pretraining-steps 100000 couple of times, mean f1 is ≈0.65. In the paper score is 0.7. Other methods also show a slightly lower score, although "supervised" one matches exactly. Am I missing some hyperparameter or it's just a typical score fluctuation?

@ankeshanand
Copy link
Member

ankeshanand commented Sep 8, 2019

@htdt Thanks, we'll fix that typo! That might have creeped in when we were cleaning up the code!

On the results front, the F1 score you get after running the code is the mean across all probes. The F1 score we report (and mention) in the paper is the mean across every category (we do this to avoid certain categories dominating the mean F1 score).

The averaging code is a post-processing step right now (see Table 2 at https://github.com/ankeshanand/atari-representation-learning/blob/master/notebooks/summarize_results.ipynb) but we should try to integrate it in the main code, or be explicit about the reported score! Thanks for pointing this out as well!

@ankeshanand
Copy link
Member

@eracah Can we report the category-wise mean after running probes as well? Would be less confusing to people!

@ankeshanand
Copy link
Member

I have added a temporary warning here: d8412cb

@htdt
Copy link
Author

htdt commented Sep 8, 2019

clear now, makes sense!

@htdt htdt closed this as completed Sep 8, 2019
@ankeshanand ankeshanand pinned this issue Sep 8, 2019
@eracah eracah reopened this Sep 8, 2019
@eracah
Copy link
Collaborator

eracah commented Sep 8, 2019

I think pull request #42 will fix this issue. @ankeshanand, can you check it and merge it in

@eracah
Copy link
Collaborator

eracah commented Sep 8, 2019

Nvm, I merged it. Should be solved now. @htdt, feel free to reopen if you have any issues

@eracah eracah closed this as completed Sep 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants