Consider separately pass each activation and direction's weights&bias for lstm and gru #751

philloooo · 2024-08-16T18:19:30Z

This is feedback from when trying to implement gru/lstm on CoreML driven by #689.
The biases and weights are stacked together for forward and backward directions when it's bidirectional, similarly activations are passed as an array instead of distinct separate values params.

I think it's more explicit and cleaner to follow the CoreML's design which:

Pass bias & weights for each direction separately when it's bidrectional
Pass activations separately for recurrent_activation, cell_activation, activation.

What do you think?

This also helps to unblock the lstm/gru implementation on CoreML from depending on the outcome of MLConstantOperand discussion.

@fdwr @huningxin

The text was updated successfully, but these errors were encountered:

fdwr · 2024-09-04T05:47:59Z

Thank you for the implementation findings. Let me digest those fields again (ironically, I have a short-term memory when it comes to LSTM's details 😅).

@shiyi9801, do you think this would have also helped your implementation? I recall you doing some concatenation manipulation here https://chromium-review.googlesource.com/c/chromium/src/+/5320673 and here https://chromium-review.googlesource.com/c/chromium/src/+/5339174.

shiyi9801 · 2024-09-04T06:39:39Z

@shiyi9801, do you think this would have also helped your implementation? I recall you doing some concatenation manipulation here https://chromium-review.googlesource.com/c/chromium/src/+/5320673 and here https://chromium-review.googlesource.com/c/chromium/src/+/5339174.

No it actually will do the opposite for DML backend. If the weights/biases are passed separately for each direction then DML backend has to do another concatenation to combine forward and backward weights/biases. (The previous concatenation is to combine bias and recurrent bias) It looks like CoreML prefers separate operands for each direction while DML prefers them as a whole...

https://learn.microsoft.com/en-us/windows/win32/api/directml/ns-directml-dml_lstm_operator_desc

/cc @miaobin for Gru

huningxin · 2024-09-04T07:43:14Z

The separate weights & bias might help simplify the emulation code, i.e. Chromium TFLite backend, that needs to slice each tensor from the combined one when using it.

/cc @fujunwei

anssiko added question opset labels Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider separately pass each activation and direction's weights&bias for lstm and gru #751

Consider separately pass each activation and direction's weights&bias for lstm and gru #751

philloooo commented Aug 16, 2024 •

edited

Loading

fdwr commented Sep 4, 2024 •

edited

Loading

shiyi9801 commented Sep 4, 2024

huningxin commented Sep 4, 2024

Consider separately pass each activation and direction's weights&bias for lstm and gru #751

Consider separately pass each activation and direction's weights&bias for lstm and gru #751

Comments

philloooo commented Aug 16, 2024 • edited Loading

fdwr commented Sep 4, 2024 • edited Loading

shiyi9801 commented Sep 4, 2024

huningxin commented Sep 4, 2024

philloooo commented Aug 16, 2024 •

edited

Loading

fdwr commented Sep 4, 2024 •

edited

Loading