SVE Instruction List by Dougall Johnson
BFMLSLB (vectors): BFloat16 multiply-subtract from single-precision (bottom)
BFMLSLB Zda.S, Zn.H, Zm.H (SVE2.1 (SME2
svfloat32_t svbfmlslb[_f32](svfloat32_t zda, svbfloat16_t zn, svbfloat16_t zm)
128-bit SVE

For each even BFloat16 float calculate (1) * (2), and subtract that from the 32-bit float from (3), then set (4) to the result.
256-bit SVE

For each even BFloat16 float calculate (1) * (2), and subtract that from the 32-bit float from (3), then set (4) to the result.
512-bit SVE

For each even BFloat16 float calculate (1) * (2), and subtract that from the 32-bit float from (3), then set (4) to the result.
Larger sizes
1024-bit SVE

For each even BFloat16 float calculate (1) * (2), and subtract that from the 32-bit float from (3), then set (4) to the result.
2048-bit SVE

For each even BFloat16 float calculate (1) * (2), and subtract that from the 32-bit float from (3), then set (4) to the result.
Report mistakes or give feedback
Inspired by and based on the x86/x64 SIMD Instruction List by Daytime.