SVE Instruction List by Dougall Johnson
FMMLA (widening, FP16 to FP32): Half-precision matrix multiply-accumulate to single-precision
FMMLA Zda.S, Zn.H, Zm.H (SVE+F16F32MM+NS
128-bit SVE

Within each 128-bit segment, interpreting the 16-bit floats from (1) and (2) as 2-by-4 and 4-by-2 matrices respectively, and the
32-bit floats from (3) as a 2-by-2 matrix, multiply (1) by (2), add the resulting 2-by-2 matrix to (3), and write the result to (4). See
the documentation for the exact order of operations.
256-bit SVE

Within each 128-bit segment, interpreting the 16-bit floats from (1) and (2) as 2-by-4 and 4-by-2 matrices respectively, and the
32-bit floats from (3) as a 2-by-2 matrix, multiply (1) by (2), add the resulting 2-by-2 matrix to (3), and write the result to (4). See
the documentation for the exact order of operations.
512-bit SVE

Within each 128-bit segment, interpreting the 16-bit floats from (1) and (2) as 2-by-4 and 4-by-2 matrices respectively, and the
32-bit floats from (3) as a 2-by-2 matrix, multiply (1) by (2), add the resulting 2-by-2 matrix to (3), and write the result to (4). See
the documentation for the exact order of operations.
Larger sizes
1024-bit SVE

Within each 128-bit segment, interpreting the 16-bit floats from (1) and (2) as 2-by-4 and 4-by-2 matrices respectively, and the
32-bit floats from (3) as a 2-by-2 matrix, multiply (1) by (2), add the resulting 2-by-2 matrix to (3), and write the result to (4). See
the documentation for the exact order of operations.
2048-bit SVE

Within each 128-bit segment, interpreting the 16-bit floats from (1) and (2) as 2-by-4 and 4-by-2 matrices respectively, and the
32-bit floats from (3) as a 2-by-2 matrix, multiply (1) by (2), add the resulting 2-by-2 matrix to (3), and write the result to (4). See
the documentation for the exact order of operations.
Report mistakes or give feedback
Inspired by and based on the x86/x64 SIMD Instruction List by Daytime.