-
-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose attention weights' head dimension #830
Comments
Sorry, it's not totally clear to me what change you're suggesting. Can you expand? |
Q
, K
, V
projections' heads
The I feel that the head dimension should be explicit here - so the shape would be 3D of This might be a bit tricky to incorporate I suppose - but it's definitely quite helpful, for example sharding along the head dimension or weight sharing. |
Ah, I see what you're saying! For specifically the purposes of sharding, then I think whatever we should do depends on whatever you and dlwh come up with in #825. |
🤷 Even with a specific sharding API, ideally one should only need to deal with the model I suppose one could add a check to fold the heads dimension if the array is 3D in MHA... but that seems janky |
I wonder if at this point, it might be better to add another optimized version of Users who want to explicitly adopt the newer features could switch over. Or I suppose one needs a seperate lib of |
Currently, equinox handles attention heads opaquely - it reshapes QKV through the
_project
method to add the heads dimension.However, sharding via the heads dimension is commonly used when parallelizing the model.
I feel that the
{query | key | value}_proj
should be splitted to expose the head dimension.WDYT?
The text was updated successfully, but these errors were encountered: