SVE Instruction List by Dougall Johnson
LD1ROW (scalar plus scalar): Contiguous load and replicate eight words (scalar index)
LD1ROW { Zt.S }, Pg/Z, [Xn, Xm, LSL #2] (SVE+F64MM+NS
svfloat32_t svld1ro[_f32](svbool_t pg, const float32_t *base)
svint32_t svld1ro[_s32](svbool_t pg, const int32_t *base)
svuint32_t svld1ro[_u32](svbool_t pg, const uint32_t *base)
128-bit SVE
This operation is undefined for 128-bit SVE.
256-bit SVE
data:image/s3,"s3://crabby-images/00a70/00a70b4079a6dff90971aef5bf467c3e187ca175" alt=""
Load each 32-bit element in the low 256-bit segment of (3) from the memory operand (2), or zero the element if the corresponding predicate bit in (1) is zero, then replicate that 128-bit segment to fill the register, ignoring the predicate. If the predicate bit corresponding to an element in the low 256-bit segment of (3) is zero, that load is skipped, and cannot cause a fault.
512-bit SVE
data:image/s3,"s3://crabby-images/99b2d/99b2d3c834c386cc02c3970b916ce8ef7ef0062d" alt=""
Load each 32-bit element in the low 256-bit segment of (3) from the memory operand (2), or zero the element if the corresponding predicate bit in (1) is zero, then replicate that 128-bit segment to fill the register, ignoring the predicate. If the predicate bit corresponding to an element in the low 256-bit segment of (3) is zero, that load is skipped, and cannot cause a fault.
Larger sizes
1024-bit SVE
data:image/s3,"s3://crabby-images/8f074/8f0745c2367dc460c09538452a643ea3092825dd" alt=""
Load each 32-bit element in the low 256-bit segment of (3) from the memory operand (2), or zero the element if the corresponding predicate bit in (1) is zero, then replicate that 128-bit segment to fill the register, ignoring the predicate. If the predicate bit corresponding to an element in the low 256-bit segment of (3) is zero, that load is skipped, and cannot cause a fault.
2048-bit SVE
data:image/s3,"s3://crabby-images/ca5c2/ca5c20fca7f5df15e1fc8f9c0db2d58a51ecd6d2" alt=""
Load each 32-bit element in the low 256-bit segment of (3) from the memory operand (2), or zero the element if the corresponding predicate bit in (1) is zero, then replicate that 128-bit segment to fill the register, ignoring the predicate. If the predicate bit corresponding to an element in the low 256-bit segment of (3) is zero, that load is skipped, and cannot cause a fault.
Report mistakes or give feedback
Inspired by and based on the x86/x64 SIMD Instruction List by Daytime.