Abnormal performance of training LLaMA3.1-70 via LoRA #2091

junzhang-zj · 2024-09-24T01:41:21Z

When I try to fine-tune both LLaMA-2-70B and LLaMA-3.1-70B with LoRA using the same code, the 3.1 seems to have an unusual loss landscape, is there anything I should be aware of?

  torch.nn.Linear.reset_parameters = lambda x: None
  model = AutoModelForCausalLM.from_pretrained(args.base_model,
                                             torch_dtype=torch.bfloat16, 
                                             attn_implementation="flash_attention_2",
                                             device_map="auto"
                                             )
      config = LoraConfig(
        r=args.lora_r,
        lora_alpha=args.lora_alpha,
        target_modules=args.lora_target_modules.split(","),
        lora_dropout=args.lora_dropout,
        bias="none",
        task_type="CAUSAL_LM",
    )
    model = get_peft_model(model, config)
    print('online lora with trained pruned offline model',model)
    print_trainable_parameters(model)

junzhang-zj · 2024-09-24T01:47:12Z

BenjaminBossan · 2024-09-24T09:40:44Z

Very hard to say just from this information. I assume you target the same layers for both? So print_trainable_parameters should give you (almost) the same values? Perhaps Llama3 works better with different hyper-parameters, but I haven't tested it myself.

junzhang-zj · 2024-09-24T09:50:21Z

Thanks for your help. The target layer is the same, I will try other hyper parameters.

junzhang-zj changed the title ~~Abnormal performance of LoRA training LLaMA3.1-70~~ Abnormal performance of training LLaMA3.1-70 via LoRA Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abnormal performance of training LLaMA3.1-70 via LoRA #2091

Abnormal performance of training LLaMA3.1-70 via LoRA #2091

junzhang-zj commented Sep 24, 2024 •

edited

Loading

junzhang-zj commented Sep 24, 2024

BenjaminBossan commented Sep 24, 2024

junzhang-zj commented Sep 24, 2024

Abnormal performance of training LLaMA3.1-70 via LoRA #2091

Abnormal performance of training LLaMA3.1-70 via LoRA #2091

Comments

junzhang-zj commented Sep 24, 2024 • edited Loading

junzhang-zj commented Sep 24, 2024

BenjaminBossan commented Sep 24, 2024

junzhang-zj commented Sep 24, 2024

junzhang-zj commented Sep 24, 2024 •

edited

Loading