Results before and after fixing shard shuffling bug #354
Replies: 4 comments
-
@DonkeyShot21 the paper results were with the fix. The 400m B/16 model was re-run (separately fromt he paper) w/ some varying hparams and also w/ 'resampling' enabled, the varations were not that significant re the bug or resampling with the lower batch sizes. However, using a higher LR and larger batch size had a bit more impact. Subsequent runs have generally been using both larger batch size and larger initial LR. EDIT: the graph below includes the set of comparison LAION-400m runs. The far left column is the original w/ the shuffle bug. Then there is a run w/ shard resampling w/ replacement enabled at 32k batch size, a run with shuffle fixed (no resampling), and then two 64k batch size runs (one with same LR as the 32k run, and one with higher initial LR).. and a convnext base. The other ViT runs w/o specified LR were all 5e-4. |
Beta Was this translation helpful? Give feedback.
-
Thank you for the quick reply! Nice to see that with a bit of tuning LAION400 basically matches the results obtained by OpenAI, and thanks for the clarifications! So, in general, you recommend using resampling or not? |
Beta Was this translation helpful? Give feedback.
-
Most of the people associated with this project doing @ scale runs have been using resampling. The graph above might suggest it's a bit worse but there is run-run variation, and in runs w/ larger global batch sizes 80-160k and larger samples-seen we dont' see much difference. It is however quite a bit more convenient, esp on larger runs where we enable resampling, and set the '# samples per epoch' ( w/ resampling for many LAION-2B runs we use 64-256 'epochs' (calling them checkpoint intervals now) and set the |
Beta Was this translation helpful? Give feedback.
-
Hi, thanks for the awesome repo.
I found this sentence in the readme:
I have a few questions:
Beta Was this translation helpful? Give feedback.
All reactions