You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I try to fine-tune both LLaMA-2-70B and LLaMA-3.1-70B with LoRA using the same code, the 3.1 seems to have an unusual loss landscape, is there anything I should be aware of?
Very hard to say just from this information. I assume you target the same layers for both? So print_trainable_parameters should give you (almost) the same values? Perhaps Llama3 works better with different hyper-parameters, but I haven't tested it myself.
When I try to fine-tune both LLaMA-2-70B and LLaMA-3.1-70B with LoRA using the same code, the 3.1 seems to have an unusual loss landscape, is there anything I should be aware of?
The text was updated successfully, but these errors were encountered: