Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider separately pass each activation and direction's weights&bias for lstm and gru #751

Open
philloooo opened this issue Aug 16, 2024 · 3 comments

Comments

@philloooo
Copy link
Contributor

philloooo commented Aug 16, 2024

This is feedback from when trying to implement gru/lstm on CoreML driven by #689.
The biases and weights are stacked together for forward and backward directions when it's bidirectional, similarly activations are passed as an array instead of distinct separate values params.

I think it's more explicit and cleaner to follow the CoreML's design which:

  • Pass bias & weights for each direction separately when it's bidrectional
  • Pass activations separately for recurrent_activation, cell_activation, activation.

What do you think?

This also helps to unblock the lstm/gru implementation on CoreML from depending on the outcome of MLConstantOperand discussion.

@fdwr @huningxin

@fdwr
Copy link
Collaborator

fdwr commented Sep 4, 2024

Thank you for the implementation findings. Let me digest those fields again (ironically, I have a short-term memory when it comes to LSTM's details 😅).

@shiyi9801, do you think this would have also helped your implementation? I recall you doing some concatenation manipulation here https://chromium-review.googlesource.com/c/chromium/src/+/5320673 and here https://chromium-review.googlesource.com/c/chromium/src/+/5339174.

@shiyi9801
Copy link
Contributor

@shiyi9801, do you think this would have also helped your implementation? I recall you doing some concatenation manipulation here https://chromium-review.googlesource.com/c/chromium/src/+/5320673 and here https://chromium-review.googlesource.com/c/chromium/src/+/5339174.

No it actually will do the opposite for DML backend. If the weights/biases are passed separately for each direction then DML backend has to do another concatenation to combine forward and backward weights/biases. (The previous concatenation is to combine bias and recurrent bias) It looks like CoreML prefers separate operands for each direction while DML prefers them as a whole...

https://learn.microsoft.com/en-us/windows/win32/api/directml/ns-directml-dml_lstm_operator_desc

/cc @miaobin for Gru

@huningxin
Copy link
Contributor

The separate weights & bias might help simplify the emulation code, i.e. Chromium TFLite backend, that needs to slice each tensor from the combined one when using it.

/cc @fujunwei

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants