You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On the GPU I could reproduce the benchmark from the issue you mentioned, but this benchmark does not take into account the time which is needed to create the reverted index. If we include that time as well, it varies which variant is faster.
To test that for our case, I compared different variants of _mono2revmono on CPU and GPU (in notebook 20_speed_improvements.ipynb). If we want to keep our flexibility, the already implemented combination of flip and indexing was the fastest. If we defined and fixed the sequence length beforehand, we could define our index matrices once and store them. When we use advanced indexing with the stored matrices, it would be faster.
Flip is much slower than advanced indexing pytorch/pytorch#16424 (comment)
The text was updated successfully, but these errors were encountered: