SVE Instruction List by Dougall Johnson
BFDOT (vectors): BFloat16 floating-point dot product
BFDOT Zda.S, Zn.H, Zm.H (SVE+BF16 (SME+BF16
svfloat32_t svbfdot[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3)
128-bit SVE
For each pair of BFloat16s from (1) and (2), compute the dot-product, then add the result to the corresponding 32-bit float accumulator from (3), setting (4) to the total. See
the documentation for the exact order of operations.
256-bit SVE
For each pair of BFloat16s from (1) and (2), compute the dot-product, then add the result to the corresponding 32-bit float accumulator from (3), setting (4) to the total. See
the documentation for the exact order of operations.
512-bit SVE
For each pair of BFloat16s from (1) and (2), compute the dot-product, then add the result to the corresponding 32-bit float accumulator from (3), setting (4) to the total. See
the documentation for the exact order of operations.
Larger sizes
1024-bit SVE
For each pair of BFloat16s from (1) and (2), compute the dot-product, then add the result to the corresponding 32-bit float accumulator from (3), setting (4) to the total. See
the documentation for the exact order of operations.
2048-bit SVE
For each pair of BFloat16s from (1) and (2), compute the dot-product, then add the result to the corresponding 32-bit float accumulator from (3), setting (4) to the total. See
the documentation for the exact order of operations.
Report mistakes or give feedback
Inspired by and based on the x86/x64 SIMD Instruction List by Daytime.