SVE Instruction List by Dougall Johnson
LD1W (scalar plus immediate, consecutive registers): Contiguous load of words to multiple consecutive vectors (immediate index)
LD1W { Zt1.S, Zt2.S, Zt3.S, Zt4.S }, PNg/Z, [Xn{, #imm, MUL VL}] (SVE2.1 (SME2+S
128-bit SVE
data:image/s3,"s3://crabby-images/b1dd0/b1dd0172f11f0ecaed92b5cce3b97bee65c8df6f" alt=""
Load 32-bit values from the memory operand (1) into the 32-bit elements of four consecutive registers (2), (3), (4), and (5). After decoding the predicate from its predicate-as-counter representation to a quadruple-length predicate, if the predicate bit corresponding to an element is zero, that load is skipped, and cannot cause a fault, and the element is set to zero. The first destination register number (2) must be divisible by four.
256-bit SVE
data:image/s3,"s3://crabby-images/9eed1/9eed1837c09a906d37eaec0bdd6224746e13fbb7" alt=""
Load 32-bit values from the memory operand (1) into the 32-bit elements of four consecutive registers (2), (3), (4), and (5). After decoding the predicate from its predicate-as-counter representation to a quadruple-length predicate, if the predicate bit corresponding to an element is zero, that load is skipped, and cannot cause a fault, and the element is set to zero. The first destination register number (2) must be divisible by four.
512-bit SVE
data:image/s3,"s3://crabby-images/1481e/1481e3b2d23c41dfb6b5c81d8be4af7eb3b815cd" alt=""
Load 32-bit values from the memory operand (1) into the 32-bit elements of four consecutive registers (2), (3), (4), and (5). After decoding the predicate from its predicate-as-counter representation to a quadruple-length predicate, if the predicate bit corresponding to an element is zero, that load is skipped, and cannot cause a fault, and the element is set to zero. The first destination register number (2) must be divisible by four.
Larger sizes
1024-bit SVE
data:image/s3,"s3://crabby-images/a3d26/a3d26653d553568602c6bc2d5206c82a6528f3c4" alt=""
Load 32-bit values from the memory operand (1) into the 32-bit elements of four consecutive registers (2), (3), (4), and (5). After decoding the predicate from its predicate-as-counter representation to a quadruple-length predicate, if the predicate bit corresponding to an element is zero, that load is skipped, and cannot cause a fault, and the element is set to zero. The first destination register number (2) must be divisible by four.
2048-bit SVE
data:image/s3,"s3://crabby-images/16dd5/16dd5b8b662a7f16d893eb0f3a3a6cba3131568e" alt=""
Load 32-bit values from the memory operand (1) into the 32-bit elements of four consecutive registers (2), (3), (4), and (5). After decoding the predicate from its predicate-as-counter representation to a quadruple-length predicate, if the predicate bit corresponding to an element is zero, that load is skipped, and cannot cause a fault, and the element is set to zero. The first destination register number (2) must be divisible by four.
Report mistakes or give feedback
Inspired by and based on the x86/x64 SIMD Instruction List by Daytime.