SVE Instruction List by Dougall Johnson
BFDOT (indexed): BFloat16 floating-point indexed dot product
BFDOT Zda.S, Zn.H, Zm.H[imm] (SVE+BF16 (SME+BF16
svfloat32_t svbfdot_lane[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3, uint64_t imm_index)
128-bit SVE
For each pair of BFloat16s from (1) and (2), compute the dot-product, then add the result to the corresponding 32-bit float accumulator from (3), setting (4) to the total. Within each 128-bit segment, the pair of values used from (1) is specified by
imm
. See
the documentation for the exact order of operations.
256-bit SVE
For each pair of BFloat16s from (1) and (2), compute the dot-product, then add the result to the corresponding 32-bit float accumulator from (3), setting (4) to the total. Within each 128-bit segment, the pair of values used from (1) is specified by
imm
. See
the documentation for the exact order of operations.
512-bit SVE
For each pair of BFloat16s from (1) and (2), compute the dot-product, then add the result to the corresponding 32-bit float accumulator from (3), setting (4) to the total. Within each 128-bit segment, the pair of values used from (1) is specified by
imm
. See
the documentation for the exact order of operations.
Larger sizes
1024-bit SVE
For each pair of BFloat16s from (1) and (2), compute the dot-product, then add the result to the corresponding 32-bit float accumulator from (3), setting (4) to the total. Within each 128-bit segment, the pair of values used from (1) is specified by
imm
. See
the documentation for the exact order of operations.
2048-bit SVE
For each pair of BFloat16s from (1) and (2), compute the dot-product, then add the result to the corresponding 32-bit float accumulator from (3), setting (4) to the total. Within each 128-bit segment, the pair of values used from (1) is specified by
imm
. See
the documentation for the exact order of operations.
Report mistakes or give feedback
Inspired by and based on the x86/x64 SIMD Instruction List by Daytime.