Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it normal the pipeline start with a huge loss ? #8

Open
qy1026 opened this issue Jul 2, 2024 · 3 comments
Open

Is it normal the pipeline start with a huge loss ? #8

qy1026 opened this issue Jul 2, 2024 · 3 comments

Comments

@qy1026
Copy link

qy1026 commented Jul 2, 2024

step 10:
{'loss': 119743.8516, 'grad_norm': 938286.7284407256, 'learning_rate': 2.0161290322580643e-09, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': -128.30323791503906, 'logps/chosen': -178.66146850585938, 'logits/rejected': -0.7681801915168762, 'logits/chosen': -0.792536735534668, 'epoch': 0.0}

step 20:
{'loss': 119688.3056, 'grad_norm': 1090985.982531398, 'learning_rate': 2.0161290322580644e-08, 'rewards/chosen': -8.749030530452728e-05, 'rewards/rejected': 0.00024323315301444381, 'rewards/accuracies': 0.2222222238779068, 'rewards/margins': -0.00033072344376705587, 'logps/rejected': -102.9691390991211, 'logps/chosen': -104.48147583007812, 'logits/rejected': -0.30933287739753723, 'logits/chosen': -0.3230978548526764, 'epoch': 0.0}

step 30:
{'loss': 122734.3, 'grad_norm': 677227.7630694123, 'learning_rate': 4.032258064516129e-08, 'rewards/chosen': -0.00015188578981906176, 'rewards/rejected': 3.675480911624618e-05, 'rewards/accuracies': 0.20000000298023224, 'rewards/margins': -0.00018864059529732913, 'logps/rejected': -132.24008178710938, 'logps/chosen': -116.12632751464844, 'logits/rejected': -0.4473434388637543, 'logits/chosen': -0.4207238554954529, 'epoch': 0.01}

I am surprised at such a huge loss, is this normal ?

@Jackory
Copy link

Jackory commented Jul 9, 2024

Same issue for me.

@kaykyr
Copy link

kaykyr commented Jul 10, 2024

Same here using my adapted algorithm to train loading model in 4bit quant (https://github.com/kaykyr/SPPO):

{'loss': 139721.6528, 'learning_rate': 4.1666666666666667e-07, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': -5881.09033203125, 'logps/chosen': -5823.2236328125, 'logits/rejected': -0.2584860622882843, 'logits/chosen': -0.2584828734397888, 'epoch': 0.25}
 25%|█████████████████████████████████▎                                                                                                   | 10/40 [07:28<22:20, 44.68s/it]

@angelahzyuan
Copy link
Collaborator

Yes, the model does start with a big loss. Since in SPPO loss in paper, eta=1/beta, and beta=1e-3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants