Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: shape '[8, 512, 768]' is invalid for input of size 614400 #50

Open
romain-rsr opened this issue Mar 27, 2023 · 1 comment

Comments

@romain-rsr
Copy link

romain-rsr commented Mar 27, 2023

When running the indicated command for rational training :

CUDA_VISIBLE_DEVICES=0,1 python main.py \
>     --model allenai/unifiedqa-t5-base \
>     --user_msg rationale --img_type detr \
>     --bs 8 --eval_bs 4 --eval_acc 10 --output_len 512 \
>     --final_eval --prompt_format QCM-LE

It leads to the following error :

model parameters:  226643712
***** Running Evaluation *****
  Num examples = 4241
  Batch size = 4
Traceback (most recent call last):
  File "main.py", line 380, in <module>
    T5Trainer(
  File "main.py", line 272, in T5Trainer
    metrics = trainer.evaluate(eval_dataset = test_set)
  File "x/lib/python3.8/site-packages/transformers/trainer_seq2seq.py", line 79, in evaluate
    return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)
  File "x/lib/python3.8/site-packages/transformers/trainer.py", line 2758, in evaluate
    output = eval_loop(
  File "x/lib/python3.8/site-packages/transformers/trainer.py", line 2936, in evaluation_loop
    loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)      
  File "x/lib/python3.8/site-packages/transformers/trainer_seq2seq.py", line 168, in prediction_step
    return super().prediction_step(
  File "x/lib/python3.8/site-packages/transformers/trainer.py", line 3177, in prediction_step
    loss, outputs = self.compute_loss(model, inputs, return_outputs=True)
  File "x/lib/python3.8/site-packages/transformers/trainer.py", line 2502, in compute_loss
    outputs = model(**inputs)
  File "x/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "x/mm-cot/model.py", line 119, in forward
    image_att, _ = self.mha_layer(hidden_states, image_embedding, image_embedding)
  File "x/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "x/lib/python3.8/site-packages/torch/nn/modules/activation.py", line 1153, in forward
    attn_output, attn_output_weights = F.multi_head_attention_forward(
  File "x/lib/python3.8/site-packages/torch/nn/functional.py", line 5122, in multi_head_attention_forward
    k = k.contiguous().view(k.shape[0], bsz * num_heads, head_dim).transpose(0, 1)
RuntimeError: shape '[8, 512, 768]' is invalid for input of size 614400

When running the rationale inference command :

CUDA_VISIBLE_DEVICES=0,1 python main.py     --model allenai/unifiedqa-t5-base     --user_msg rationale --img_type detr     --bs 8 --eval_bs 4 --eval_acc 10 --output_len 512     --final_eval --prompt_format QCM-LE     --evaluate_dir models/MM-CoT-UnifiedQA-base-Rationale 

I encounter a similar issue :

File "x/lib/python3.8/site-packages/torch/nn/functional.py", line 5122, in multi_head_attention_forward
    k = k.contiguous().view(k.shape[0], bsz * num_heads, head_dim).transpose(0, 1)
RuntimeError: shape '[4, 512, 768]' is invalid for input of size 307200

I followed each data process step indicated in the readme tough

Thanks in advance for your help on this issue

@cooelf
Copy link
Contributor

cooelf commented Oct 15, 2023

Please try the latest version. It should work well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants