SVE Instruction List by Dougall Johnson
See "CLASTA (vectors)" in the exploration tools

CLASTA (vectors): Conditionally extract element after last to vector register

CLASTA Zdn.D, Pg, Zdn.D, Zm.D (SVE (SME
svfloat64_t svclasta[_f64](svbool_t pg, svfloat64_t fallback, svfloat64_t data)
svint64_t svclasta[_s64](svbool_t pg, svint64_t fallback, svint64_t data)
svuint64_t svclasta[_u64](svbool_t pg, svuint64_t fallback, svuint64_t data)

128-bit SVE

Find the last (leftmost) 64-bit element from (2) where the corresponding predicate bit in (1) is non-zero, then broadcast the next element to all 64-bit lanes of (4). If the last corresponding predicate bit is non-zero, broadcast the first (rightmost) element from (1) to all lanes of (4). If all corresponding predicate bits are zero, preserve the value from (3).

256-bit SVE

Find the last (leftmost) 64-bit element from (2) where the corresponding predicate bit in (1) is non-zero, then broadcast the next element to all 64-bit lanes of (4). If the last corresponding predicate bit is non-zero, broadcast the first (rightmost) element from (1) to all lanes of (4). If all corresponding predicate bits are zero, preserve the value from (3).

512-bit SVE

Find the last (leftmost) 64-bit element from (2) where the corresponding predicate bit in (1) is non-zero, then broadcast the next element to all 64-bit lanes of (4). If the last corresponding predicate bit is non-zero, broadcast the first (rightmost) element from (1) to all lanes of (4). If all corresponding predicate bits are zero, preserve the value from (3).

Larger sizes

1024-bit SVE

Find the last (leftmost) 64-bit element from (2) where the corresponding predicate bit in (1) is non-zero, then broadcast the next element to all 64-bit lanes of (4). If the last corresponding predicate bit is non-zero, broadcast the first (rightmost) element from (1) to all lanes of (4). If all corresponding predicate bits are zero, preserve the value from (3).

2048-bit SVE

Find the last (leftmost) 64-bit element from (2) where the corresponding predicate bit in (1) is non-zero, then broadcast the next element to all 64-bit lanes of (4). If the last corresponding predicate bit is non-zero, broadcast the first (rightmost) element from (1) to all lanes of (4). If all corresponding predicate bits are zero, preserve the value from (3).

Switch to Low-DPI
Report mistakes or give feedback
Inspired by and based on the x86/x64 SIMD Instruction List by Daytime.