SVE Instruction List by Dougall Johnson
LD4W (scalar plus immediate): Contiguous load four-word structures to four vectors (immediate index)
LD4W { Zt1.S, Zt2.S, Zt3.S, Zt4.S }, Pg/Z, [Xn{, #imm, MUL VL}] (SVE (SME
svfloat32x4_t svld4_vnum[_f32](svbool_t pg, const float32_t *base, int64_t vnum)
svint32x4_t svld4_vnum[_s32](svbool_t pg, const int32_t *base, int64_t vnum)
svuint32x4_t svld4_vnum[_u32](svbool_t pg, const uint32_t *base, int64_t vnum)
128-bit SVE
Load and deinterleave groups of four interleaved 32-bit values from the memory operand (1) into the 32-bit elements of four consecutive registers (2), (3), (4), and (5). If the predicate bit corresponding to an element in (2), (3), (4), and (5) is zero, those four contiguous loads are skipped, and cannot cause a fault, and the elements are set to zero.
256-bit SVE
Load and deinterleave groups of four interleaved 32-bit values from the memory operand (1) into the 32-bit elements of four consecutive registers (2), (3), (4), and (5). If the predicate bit corresponding to an element in (2), (3), (4), and (5) is zero, those four contiguous loads are skipped, and cannot cause a fault, and the elements are set to zero.
512-bit SVE
Load and deinterleave groups of four interleaved 32-bit values from the memory operand (1) into the 32-bit elements of four consecutive registers (2), (3), (4), and (5). If the predicate bit corresponding to an element in (2), (3), (4), and (5) is zero, those four contiguous loads are skipped, and cannot cause a fault, and the elements are set to zero.
Larger sizes
1024-bit SVE
Load and deinterleave groups of four interleaved 32-bit values from the memory operand (1) into the 32-bit elements of four consecutive registers (2), (3), (4), and (5). If the predicate bit corresponding to an element in (2), (3), (4), and (5) is zero, those four contiguous loads are skipped, and cannot cause a fault, and the elements are set to zero.
2048-bit SVE
Load and deinterleave groups of four interleaved 32-bit values from the memory operand (1) into the 32-bit elements of four consecutive registers (2), (3), (4), and (5). If the predicate bit corresponding to an element in (2), (3), (4), and (5) is zero, those four contiguous loads are skipped, and cannot cause a fault, and the elements are set to zero.
Report mistakes or give feedback
Inspired by and based on the x86/x64 SIMD Instruction List by Daytime.