Skip to content

Latest commit

 

History

History
81 lines (60 loc) · 5.35 KB

Instructions.md

File metadata and controls

81 lines (60 loc) · 5.35 KB

Notation

x denotes a 64-byte span from the X register pool, accessed as a vector of lanes. The lanes are indexed by i.

y denotes a 64-byte span from the Y register pool, accessed as a vector of lanes. The lanes are indexed by j (or by i for vector operations).

z denotes the entire set of 64x64-byte Z registers, with 2D indexing. When only one index variable is used, [_] denotes that the other index comes from the instruction operand (typically a bitfield called "Z row" or "Z column").

f denotes some function. f(x, y) = x * y is usually one option for binary functions. fs(z) = z >> s is usually one option for unary functions.

Some instructions can operate in multiple distinct modes. In these cases, the instruction name is followed by the relevant mode bits. When the mode field is a single bit #N, this is denoted as "(N=0)" or "(N=1)". When the mode field is multiple bits starting at bit #N, this is denoted as "(N=M)" or "(N≠M)" or "(N≤M)" or "(N≥M)".

Setup and clear

Instruction General theme Notes
set Setup AMX state Raises invalid instruction exception if already setup. All registers set to zero.
clr Clear AMX state All registers set to uninitialised, no longer need saving/restoring on context switch.

Memory

Instruction General theme Optional special features
ldx    x[i] = memory[i] Load pair
ldy    y[i] = memory[i] Load pair
ldz
ldzi
z[_][i] = memory[i] Load pair, interleaved Z
stx memory[i] =    x[i] Store pair
sty memory[i] =    y[i] Store pair
stz
stzi
memory[i] = z[_][i] Store pair, interleaved Z

Floating-point matrix arithmetic (i.e. outer products), writing to z

Instruction General theme Writemask Optional special features
fma64 (63=0)
fma32 (63=0)
fma16 (63=0)
z[j][i] += x[i] * y[j] 7 bit X, 7 bit Y X/Y/Z input disable
fms64 (63=0)
fms32 (63=0)
fms16 (63=0)
z[j][i] -= x[i] * y[j] 7 bit X, 7 bit Y X/Y/Z input disable
matfp z[j][i] ±= f(x[i], y[j]) 9 bit X, 9 bit Y Indexed X or Y, shuffle X, shuffle Y,
positive selection

Integer matrix arithmetic (i.e. outer products), writing to z

Instruction General theme Writemask Optional special features
mac16 (63=0) z[j][i] += x[i] * y[j] 7 bit X, 7 bit Y X/Y/Z input disable, right shift
matint (47≠4) z[j][i] ±= f(x[i], y[j]) 9 bit X or Y Indexed X or Y, shuffle X, shuffle Y,
right shift, sqrdmlah, popcnt
matint (47=4) z[j][i]  = f(z[j][i]) 9 bit X or Y Right shift, saturation

Floating-point vector arithmetic (i.e. pointwise products), writing to z

Instruction General theme Writemask Optional special features
fma64 (63=1)
fma32 (63=1)
fma16 (63=1)
z[_][i] += x[i] * y[i] 7 bit X/Y/Z input disable
fms64 (63=1)
fms32 (63=1)
fms16 (63=1)
z[_][i] -= x[i] * y[i] 7 bit X/Y/Z input disable
vecfp z[_][i] ±= f(x[i], y[i]) 9 bit Indexed X or Y, shuffle X, shuffle Y,
broadcast Y element,
positive selection, min, max

Integer vector arithmetic (i.e. pointwise products), writing to z

Instruction General theme Writemask Optional special features
mac16 (63=1) z[_][i] += x[i] * y[i] 7 bit X/Y/Z input disable, right shift
vecint (47≠4) z[_][i] ±= f(x[i], y[i]) 9 bit Indexed X or Y, shuffle X, shuffle Y,
broadcast Y element, right shift, sqrdmlah
vecint (47=4) z[_][i] = f(z[_][i]) 9 bit Right shift, saturation

Vector data movement, writing to x or y

Instruction General theme Writemask Optional special features
extrx x[i] = y[i] None
extry y[i] = x[i] None
extrh (26=0) x[i] =   z[_][i]  7 bit
extrh (26=1,10=0) x[i] = f(z[_][i]) 9 bit Integer right shift, integer saturation
extrv (26=1,10=0) x[j] = f(z[j][_]) 9 bit Integer right shift, integer saturation
extrv (26=0) y[j] =   z[j][_]  7 bit
extrv (26=1,10=1) y[j] = f(z[j][_]) 9 bit Integer right shift, integer saturation
extrh (26=1,10=1) y[i] = f(z[_][i]) 9 bit Integer right shift, integer saturation

Vector other

Instruction General theme Notes
genlut (53≤6) Generate indices for indexed load For use by matfp / matint / vecfp / vecint / genlut (53≥7)
genlut (53≥7) Perform indexed load Can write to any of x or y or z