Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abnormal performance of training LLaMA3.1-70 via LoRA #2091

Open
junzhang-zj opened this issue Sep 24, 2024 · 3 comments
Open

Abnormal performance of training LLaMA3.1-70 via LoRA #2091

junzhang-zj opened this issue Sep 24, 2024 · 3 comments

Comments

@junzhang-zj
Copy link

junzhang-zj commented Sep 24, 2024

When I try to fine-tune both LLaMA-2-70B and LLaMA-3.1-70B with LoRA using the same code, the 3.1 seems to have an unusual loss landscape, is there anything I should be aware of?

  torch.nn.Linear.reset_parameters = lambda x: None
  model = AutoModelForCausalLM.from_pretrained(args.base_model,
                                             torch_dtype=torch.bfloat16, 
                                             attn_implementation="flash_attention_2",
                                             device_map="auto"
                                             )
      config = LoraConfig(
        r=args.lora_r,
        lora_alpha=args.lora_alpha,
        target_modules=args.lora_target_modules.split(","),
        lora_dropout=args.lora_dropout,
        bias="none",
        task_type="CAUSAL_LM",
    )
    model = get_peft_model(model, config)
    print('online lora with trained pruned offline model',model)
    print_trainable_parameters(model)
@junzhang-zj
Copy link
Author

截屏2024-09-24 09 46 54

@junzhang-zj junzhang-zj changed the title Abnormal performance of LoRA training LLaMA3.1-70 Abnormal performance of training LLaMA3.1-70 via LoRA Sep 24, 2024
@BenjaminBossan
Copy link
Member

Very hard to say just from this information. I assume you target the same layers for both? So print_trainable_parameters should give you (almost) the same values? Perhaps Llama3 works better with different hyper-parameters, but I haven't tested it myself.

@junzhang-zj
Copy link
Author

Thanks for your help. The target layer is the same, I will try other hyper parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants