Update scaled_dot_product_attention to work with >6 inputs in latest torch version #2021

ZachNagengast · 2023-10-20T22:55:30Z

In torch 2.1, scaled_dot_product_attention now has 7 inputs, whereas 2.0 has only 6.

2.0 docs: https://pytorch.org/docs/2.0/generated/torch.nn.functional.scaled_dot_product_attention.html#torch.nn.functional.scaled_dot_product_attention
2.1 docs: https://pytorch.org/docs/2.1/generated/torch.nn.functional.scaled_dot_product_attention.html#torch.nn.functional.scaled_dot_product_attention

This PR will allow for both interfaces to work, and simply ignore the new one. This issue arises for any conversion that uses this op on Torch 2.1:

ValueError: node hidden_states.11 (scaled_dot_product_attention) got 7 input(s), expected [6]

TobyRoseman · 2023-10-20T23:58:46Z

Thanks for the review @ZachNagengast. I actually have a fix for this issue locally that I was planning to put up for a PR soon. I was waiting to bundle it with a few other small changes.

We should raise an error if the 7th value (i.e. the scale) is not None. We should also raise an error if the number of parameters is more than 7. Feel free to make these changes, or I can put up my fix.

ZachNagengast · 2023-10-21T05:06:27Z

Yea no problem, I'll add it here shortly.

ZachNagengast · 2023-10-23T15:18:28Z

@TobyRoseman This should be all set, although I'm not sure how to trigger CI here.

TobyRoseman · 2023-10-24T00:15:05Z

Thanks @ZachNagengast for the changes. It looks good to me.

I have to kick off the CI: https://gitlab.com/coremltools1/coremltools/-/pipelines/1046916275

Zulqurnain24 · 2023-10-26T11:46:53Z

Thanks for the review @ZachNagengast. I actually have a fix for this issue locally that I was planning to put up for a PR soon. I was waiting to bundle it with a few other small changes.

We should raise an error if the 7th value (i.e. the scale) is not None. We should also raise an error if the number of parameters is more than 7. Feel free to make these changes, or I can put up my fix.

The input issue is now fixed but I am getting this liuliu/swift-diffusion#48 due to AttributeError: 'Namespace' object has no attribute 'merge_chunks_in_pipeline_model'

ZachNagengast · 2023-10-27T14:40:43Z

@TobyRoseman Looks like the CI passed, anything else needed?

@Zulqurnain24 This is because chunk_mlprogram.py has no default value for the argument merge_chunks_in_pipeline_model, and it appears to be missing from wherever it was called in your script https://github.com/apple/ml-stable-diffusion/blob/main/python_coreml_stable_diffusion/chunk_mlprogram.py#L317 you just need to set it to a default before calling the chunk script https://github.com/apple/ml-stable-diffusion/blob/bea04420b5958935a975d3b7aeb071bbbaa9a097/python_coreml_stable_diffusion/torch2coreml.py#L908

TobyRoseman · 2023-10-27T15:21:12Z

@ZachNagengast - nothing else is needed. I will merge it after the release is finished.

TobyRoseman · 2023-11-01T20:52:33Z

@ZachNagengast - Our release has finished. I would merge this now, but there is a conflict. Can you fix the conflict? Then I'll kick off another CI run.

ZachNagengast · 2023-11-01T23:35:21Z

Ok great, I'll merge in main shortly

ZachNagengast · 2023-11-02T14:25:08Z

@TobyRoseman All set, there was code here that was doing similar but didn't handle the error if scale was non-null, and only looked for a minimum of 3 params instead of strictly between 6-7, hopefully this implementation is the preferred one.

TobyRoseman · 2023-11-02T21:11:03Z

Thanks @ZachNagengast - the change looks good.

Update CI: https://gitlab.com/coremltools1/coremltools/-/pipelines/1059479892

ZachNagengast · 2023-11-08T19:05:27Z

@TobyRoseman Unclear why the previous ci failed, but I just updated from main again, perhaps you can rerun

TobyRoseman · 2023-11-08T22:02:10Z

@ZachNagengast - the CI failures look related to your change. I don't think updating from main is going to fix it.

FAILED ../../envs/coremltools-py3.9/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/test/test_torch_ops.py::TestScaledDotProductAttention::test_different_input_ranks_no_mask[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-rank=2]
FAILED ../../envs/coremltools-py3.9/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/test/test_torch_ops.py::TestScaledDotProductAttention::test_different_input_ranks_no_mask[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-rank=3]
FAILED ../../envs/coremltools-py3.9/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/test/test_torch_ops.py::TestScaledDotProductAttention::test_different_input_ranks_no_mask[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-rank=4]
FAILED ../../envs/coremltools-py3.9/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/test/test_torch_ops.py::TestScaledDotProductAttention::test_different_input_ranks_no_mask[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-rank=5]
FAILED ../../envs/coremltools-py3.9/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/test/test_torch_ops.py::TestScaledDotProductAttention::test_different_input_ranks_no_mask[compute_unit=ComputeUnit.CPU_ONLY-backend=('neuralnetwork', 'fp32')-rank=2]
FAILED ../../envs/coremltools-py3.9/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/test/test_torch_ops.py::TestScaledDotProductAttention::test_different_input_ranks_no_mask[compute_unit=ComputeUnit.CPU_ONLY-backend=('neuralnetwork', 'fp32')-rank=3]
FAILED ../../envs/coremltools-py3.9/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/test/test_torch_ops.py::TestScaledDotProductAttention::test_different_input_ranks_no_mask[compute_unit=ComputeUnit.CPU_ONLY-backend=('neuralnetwork', 'fp32')-rank=4]
FAILED ../../envs/coremltools-py3.9/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/test/test_torch_ops.py::TestScaledDotProductAttention::test_different_input_ranks_no_mask[compute_unit=ComputeUnit.CPU_ONLY-backend=('neuralnetwork', 'fp32')-rank=5]
FAILED ../../envs/coremltools-py3.9/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/test/test_torch_ops.py::TestScaledDotProductAttention::test_attn_mask[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-seq_lengths=(5, 5)-bool_mask=False]
FAILED ../../envs/coremltools-py3.9/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/test/test_torch_ops.py::TestScaledDotProductAttention::test_attn_mask[compute_unit=ComputeUnit.CPU_ONLY-backend=('neuralnetwork', 'fp32')-seq_lengths=(5, 5)-bool_mask=False]
FAILED ../../envs/coremltools-py3.9/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/test/test_torch_ops.py::TestScaledDotProductAttention::test_attn_mask[compute_unit=ComputeUnit.CPU_ONLY-backend=('neuralnetwork', 'fp32')-seq_lengths=(7, 5)-bool_mask=False]
FAILED ../../envs/coremltools-py3.9/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/test/test_torch_ops.py::TestScaledDotProductAttention::test_toy_xformer_with_sdpa[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')-mask_as_input=True]
FAILED ../../envs/coremltools-py3.9/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/test/test_torch_ops.py::TestScaledDotProductAttention::test_toy_xformer_with_sdpa[compute_unit=ComputeUnit.CPU_ONLY-backend=('neuralnetwork', 'fp32')-mask_as_input=True]
FAILED ../../envs/coremltools-py3.9/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/test/test_torch_ops.py::TestScaledDotProductAttention::test_dropout_early_error_out
FAILED ../../envs/coremltools-py3.9/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/test/test_torch_ops.py::TestTransformer::test_transformer_encoder[compute_unit=ComputeUnit.CPU_ONLY-backend=('mlprogram', 'fp16')]
FAILED ../../envs/coremltools-py3.9/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/test/test_torch_ops.py::TestTransformer::test_transformer_encoder[compute_unit=ComputeUnit.CPU_ONLY-backend=('neuralnetwork', 'fp32')]

I suggest you try to reproduce these failures locally. Or let me know if you'd like me to try my previously mentioned fix.

ZachNagengast · 2023-11-09T06:19:48Z

@TobyRoseman I see the issue, there were changes that came from the merge that I didn't notice originally. Updated to restore the pre-merge version.

TobyRoseman · 2023-11-09T19:50:27Z

The current diff makes it very difficult to understand what changes you have actually made. Please do a rebase squash on top of current main, i.e. please update this PR so there is just one commit and its parent is the tip of main.

Handle new scaled param in scaled_dot_product_attention Fix param name Update docstring and error type Update format Restore original scaled_dot_product_attention Fix is_causal for scaled dot product attn

ZachNagengast · 2023-11-09T20:14:57Z

@TobyRoseman Alright I simplified the whole thing a bit, all that needed changing was a little change in the original implementation where is_causal was being set to is_causal.val, so I've fixed that and rebased.

TobyRoseman · 2023-11-09T20:18:58Z

@TobyRoseman Alright I simplified the whole thing a bit, all that needed changing was a little change in the original implementation where is_causal was being set to is_causal.val, so I've fixed that and rebased.

Thank you. The diff is much clearer.

TobyRoseman · 2023-11-09T20:23:47Z

coremltools/converters/mil/frontend/torch/ops.py

-    dropout = 0.0 if len(inputs) < 5 else inputs[4]
-    is_causal = False if len(inputs) < 6 else inputs[5].val
-    if attn_mask is not None and is_causal:
+    inputs = _get_inputs(context, node, expected=[6, 7])


We should keep the current (less strict requirement) of min_expected=3 rather than expected=[6, 7]. We should keep the logic from the above deleted code that set those to default values if they are not present.

TobyRoseman · 2023-11-10T21:19:09Z

This looks great.

CI: https://gitlab.com/coremltools1/coremltools/-/pipelines/1068616173

ZachNagengast · 2023-11-11T23:58:45Z

@TobyRoseman Missed one of the is_causal changes from previous causing this error:

>       if attn_mask is not None and is_causal.val:
E       AttributeError: 'bool' object has no attribute 'val'

Fixed now.

TobyRoseman · 2023-11-13T17:12:23Z

Updated CI: https://gitlab.com/coremltools1/coremltools/-/pipelines/1070604949

TobyRoseman · 2023-11-13T21:25:51Z

CI is green. Thanks for the submission @ZachNagengast.

This was referenced Oct 22, 2023

When converting Unet of SDXL 0.9, an error occurs. apple/ml-stable-diffusion#204

Closed

diffusers==0.16.0 not working with PyTorch default v2.0.0 apple/ml-stable-diffusion#173

Closed

jrittvo mentioned this pull request Oct 25, 2023

Model conversion failed apple/ml-stable-diffusion#288

Closed

TobyRoseman self-assigned this Oct 27, 2023

jakesabathia2 previously approved these changes Oct 27, 2023

View reviewed changes

ZachNagengast dismissed jakesabathia2’s stale review via b89d0b7 November 2, 2023 14:21

Update scaled_dot_product_attention to work with >6 inputs in torch 2.1

4fa16a0

Handle new scaled param in scaled_dot_product_attention Fix param name Update docstring and error type Update format Restore original scaled_dot_product_attention Fix is_causal for scaled dot product attn

ZachNagengast force-pushed the fix-scaled_dot_product_attention-inputs branch from 571e162 to 4fa16a0 Compare November 9, 2023 20:12

TobyRoseman requested changes Nov 9, 2023

View reviewed changes

Set defaults for missing inputs

c0b02ee

ZachNagengast requested a review from TobyRoseman November 9, 2023 20:33

Fix val change from previous code

4adf93f

TobyRoseman approved these changes Nov 13, 2023

View reviewed changes

TobyRoseman merged commit e72db64 into apple:main Nov 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update scaled_dot_product_attention to work with >6 inputs in latest torch version #2021

Update scaled_dot_product_attention to work with >6 inputs in latest torch version #2021

ZachNagengast commented Oct 20, 2023 •

edited

Loading

TobyRoseman commented Oct 20, 2023

ZachNagengast commented Oct 21, 2023

ZachNagengast commented Oct 23, 2023

TobyRoseman commented Oct 24, 2023

Zulqurnain24 commented Oct 26, 2023

ZachNagengast commented Oct 27, 2023

TobyRoseman commented Oct 27, 2023

TobyRoseman commented Nov 1, 2023

ZachNagengast commented Nov 1, 2023

ZachNagengast commented Nov 2, 2023

TobyRoseman commented Nov 2, 2023

ZachNagengast commented Nov 8, 2023

TobyRoseman commented Nov 8, 2023

ZachNagengast commented Nov 9, 2023

TobyRoseman commented Nov 9, 2023

ZachNagengast commented Nov 9, 2023

TobyRoseman commented Nov 9, 2023

TobyRoseman Nov 9, 2023

ZachNagengast Nov 9, 2023

TobyRoseman commented Nov 10, 2023

ZachNagengast commented Nov 11, 2023

TobyRoseman commented Nov 13, 2023

TobyRoseman commented Nov 13, 2023

Update scaled_dot_product_attention to work with >6 inputs in latest torch version #2021

Update scaled_dot_product_attention to work with >6 inputs in latest torch version #2021

Conversation

ZachNagengast commented Oct 20, 2023 • edited Loading

TobyRoseman commented Oct 20, 2023

ZachNagengast commented Oct 21, 2023

ZachNagengast commented Oct 23, 2023

TobyRoseman commented Oct 24, 2023

Zulqurnain24 commented Oct 26, 2023

ZachNagengast commented Oct 27, 2023

TobyRoseman commented Oct 27, 2023

TobyRoseman commented Nov 1, 2023

ZachNagengast commented Nov 1, 2023

ZachNagengast commented Nov 2, 2023

TobyRoseman commented Nov 2, 2023

ZachNagengast commented Nov 8, 2023

TobyRoseman commented Nov 8, 2023

ZachNagengast commented Nov 9, 2023

TobyRoseman commented Nov 9, 2023

ZachNagengast commented Nov 9, 2023

TobyRoseman commented Nov 9, 2023

TobyRoseman Nov 9, 2023

Choose a reason for hiding this comment

ZachNagengast Nov 9, 2023

Choose a reason for hiding this comment

TobyRoseman commented Nov 10, 2023

ZachNagengast commented Nov 11, 2023

TobyRoseman commented Nov 13, 2023

TobyRoseman commented Nov 13, 2023

ZachNagengast commented Oct 20, 2023 •

edited

Loading