SVE Instruction List by Dougall Johnson
LD4B (scalar plus scalar): Contiguous load four-byte structures to four vectors (scalar index)
LD4B { Zt1.B, Zt2.B, Zt3.B, Zt4.B }, Pg/Z, [Xn, Xm] (SVE (SME
svint8x4_t svld4[_s8](svbool_t pg, const int8_t *base)
svuint8x4_t svld4[_u8](svbool_t pg, const uint8_t *base)
128-bit SVE
Load and deinterleave groups of four interleaved 8-bit values from the memory operand (1) into the 8-bit elements of four consecutive registers (2), (3), (4), and (5). If the predicate bit corresponding to an element in (2), (3), (4), and (5) is zero, those four contiguous loads are skipped, and cannot cause a fault, and the elements are set to zero.
256-bit SVE
Load and deinterleave groups of four interleaved 8-bit values from the memory operand (1) into the 8-bit elements of four consecutive registers (2), (3), (4), and (5). If the predicate bit corresponding to an element in (2), (3), (4), and (5) is zero, those four contiguous loads are skipped, and cannot cause a fault, and the elements are set to zero.
512-bit SVE
Load and deinterleave groups of four interleaved 8-bit values from the memory operand (1) into the 8-bit elements of four consecutive registers (2), (3), (4), and (5). If the predicate bit corresponding to an element in (2), (3), (4), and (5) is zero, those four contiguous loads are skipped, and cannot cause a fault, and the elements are set to zero.
Larger sizes
1024-bit SVE
Load and deinterleave groups of four interleaved 8-bit values from the memory operand (1) into the 8-bit elements of four consecutive registers (2), (3), (4), and (5). If the predicate bit corresponding to an element in (2), (3), (4), and (5) is zero, those four contiguous loads are skipped, and cannot cause a fault, and the elements are set to zero.
2048-bit SVE
Load and deinterleave groups of four interleaved 8-bit values from the memory operand (1) into the 8-bit elements of four consecutive registers (2), (3), (4), and (5). If the predicate bit corresponding to an element in (2), (3), (4), and (5) is zero, those four contiguous loads are skipped, and cannot cause a fault, and the elements are set to zero.
Report mistakes or give feedback
Inspired by and based on the x86/x64 SIMD Instruction List by Daytime.