in-place (scaled) matrix transposition: imatcopy #1017

rileyjmurray · 2024-05-21T15:13:15Z

Intel MKL has a useful function for (scaled) in-place transposition:

https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-fortran/2024-1/mkl-imatcopy.html

I raised the issue of having this function as a utility in LAPACK proper, and Julian (@langou) expressed support for that. I'm making a note of it here so we don't lose track of this.

thijssteel · 2024-05-22T07:52:59Z

I had a look at in-place matrix transposes at some point, but they are far from trivial to implement.

If the matrix is square, it is just a question of swapping A(i,j) and A(j,i). A simple recursive implementation is even reasonably efficient.

If the matrix is not square, the cycle lengths can be longer, so we almost certainly require some memory (not just for an efficient implementation, but for a reference too).

ilayn · 2024-05-22T07:58:06Z

Indeed cache-oblivious transpose is really hard to get it right and quite challenging in terms of low-level decisions. But I would be really happy if we get it. I do think it won't be as performant as the copy-transpose but there is still a lot to reap compared to a double loop.

mgates3 · 2024-05-22T15:21:37Z

copy is an odd name and description for an in-place operation. From Intel's docs, "A transposition operation can be a normal matrix copy, a transposition, a conjugate transposition, or just a conjugation. The operation is defined as follows:

AB := alpha*op(AB)."

(emphasis mine)

Since the source and destination are both AB, I gather that "a normal matrix copy" (trans=N) is basically changing the stride in place, from lda to ldb? (And applying scaling.)

Gustavson has several papers on in-place transpose, e.g., https://doi.org/10.1145/2168773.2168775. Something akin to this in implemented in PLASMA.

rileyjmurray · 2024-05-24T16:48:13Z

@mgates3, while I agree that imatcopy is not a great name, it's a fairly common one:

Intel (MKL/OneAPI): https://spec.oneapi.io/versions/1.2-rev-1/elements/oneMKL/source/domains/blas/imatcopy.html
AMD (AOCL/BLIS): https://github.com/search?q=repo%3Aamd%2Fblis%20imatcopy&type=code
Arm (APL): https://developer.arm.com/documentation/101004/2404/BLAS-Basic-Linear-Algebra-Subprograms/BLAS-Extensions/dimatcopy?lang=en

In situations like these I think it's preferable to go with the crowd, and just have good documentation about what can be done with the function.

@thijssteel and @ilayn: it'd be fine with me to use some workspace. When I said in-place, I really just meant "don't force the user to make a full copy."

mgates3 · 2024-05-24T18:19:24Z

@rileyjmurray Respectfully, I disagree. If Reference BLAS / LAPACK is a (de facto) standard, it should be reasonably self-consistent. imatcopy and omatcopy don't follow the BLAS naming scheme in any way, though the arguments seem conforming. If we propose a standard name, it should be trivial for libraries that have imatcopy to add a new name.

This goes to my point on gemmt and batch gemm, that we as a community need a mechanism to propose and adopt new standard routines.

rileyjmurray · 2024-05-24T18:39:50Z

@mgates3, fair enough! I don't actually have a strong opinion on what the function should be called.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

in-place (scaled) matrix transposition: imatcopy #1017

in-place (scaled) matrix transposition: imatcopy #1017

rileyjmurray commented May 21, 2024

thijssteel commented May 22, 2024

ilayn commented May 22, 2024

mgates3 commented May 22, 2024

rileyjmurray commented May 24, 2024

mgates3 commented May 24, 2024

rileyjmurray commented May 24, 2024

in-place (scaled) matrix transposition: imatcopy #1017

in-place (scaled) matrix transposition: imatcopy #1017

Comments

rileyjmurray commented May 21, 2024

thijssteel commented May 22, 2024

ilayn commented May 22, 2024

mgates3 commented May 22, 2024

rileyjmurray commented May 24, 2024

mgates3 commented May 24, 2024

rileyjmurray commented May 24, 2024