SVE Instruction List by Dougall Johnson
See "BFDOT (vectors)" in the exploration tools

BFDOT (vectors): BFloat16 floating-point dot product

BFDOT Zda.S, Zn.H, Zm.H (SVE+BF16 (SME+BF16
svfloat32_t svbfdot[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3)

128-bit SVE

For each pair of BFloat16s from (1) and (2), compute the dot-product, then add the result to the corresponding 32-bit float accumulator from (3), setting (4) to the total. See the documentation for the exact order of operations.

256-bit SVE

For each pair of BFloat16s from (1) and (2), compute the dot-product, then add the result to the corresponding 32-bit float accumulator from (3), setting (4) to the total. See the documentation for the exact order of operations.

512-bit SVE

For each pair of BFloat16s from (1) and (2), compute the dot-product, then add the result to the corresponding 32-bit float accumulator from (3), setting (4) to the total. See the documentation for the exact order of operations.

Larger sizes

1024-bit SVE

For each pair of BFloat16s from (1) and (2), compute the dot-product, then add the result to the corresponding 32-bit float accumulator from (3), setting (4) to the total. See the documentation for the exact order of operations.

2048-bit SVE

For each pair of BFloat16s from (1) and (2), compute the dot-product, then add the result to the corresponding 32-bit float accumulator from (3), setting (4) to the total. See the documentation for the exact order of operations.

Report mistakes or give feedback
Inspired by and based on the x86/x64 SIMD Instruction List by Daytime.