SVE Instruction List by Dougall Johnson

See "FMLALLBB (vectors)" in the exploration tools

FMLALLBB (vectors): 8-bit floating-point multiply-add to single-precision (bottom bottom)

FMLALLBB Zda.S, Zn.B, Zm.B (SVE2+FP8FMA+NS (SSVE-FP8FMA

128-bit SVE

For each 32-bit float element of (4), multiply the 8-bit floats at byte position 0 of the corresponding 4-byte groups in (1) and (2), and add the product to the 32-bit float accumulator (3). The FP8 format for each 8-bit source operand is selected independently by FPMR.

256-bit SVE

For each 32-bit float element of (4), multiply the 8-bit floats at byte position 0 of the corresponding 4-byte groups in (1) and (2), and add the product to the 32-bit float accumulator (3). The FP8 format for each 8-bit source operand is selected independently by FPMR.

512-bit SVE

For each 32-bit float element of (4), multiply the 8-bit floats at byte position 0 of the corresponding 4-byte groups in (1) and (2), and add the product to the 32-bit float accumulator (3). The FP8 format for each 8-bit source operand is selected independently by FPMR.

Larger sizes

1024-bit SVE

For each 32-bit float element of (4), multiply the 8-bit floats at byte position 0 of the corresponding 4-byte groups in (1) and (2), and add the product to the 32-bit float accumulator (3). The FP8 format for each 8-bit source operand is selected independently by FPMR.

2048-bit SVE

For each 32-bit float element of (4), multiply the 8-bit floats at byte position 0 of the corresponding 4-byte groups in (1) and (2), and add the product to the 32-bit float accumulator (3). The FP8 format for each 8-bit source operand is selected independently by FPMR.

Report mistakes or give feedback
Inspired by and based on the x86/x64 SIMD Instruction List by Daytime.