SVE | SME (WIP, no diagrams)

A64 SIMD Instruction List: SME Instructions

This is inspired by and based on the x86/x64 SIMD Instruction List by Daytime.

This is not an official reference, and may contain mistakes. It is intended to make it easier to find instructions, and to provide an alternative perspective. While writing SME code, please refer to the Arm® Exploration Tools, or Arm® ARM with SME Supplement.

Merging and zeroing predication is typically omitted from the diagrams, but it is shown in operations like BRKN and LD1RQB that use the /Z syntax but have unusual semantics.

This is an ongoing project - dark-red links are missing full descriptions, bright-red links are also missing diagrams and instead link to the documentation in the exploration tools.

Report mistakes or send feedback.

Target

Note: this does not support filtering by vector length, so some unavailable operations may appear available even after selecting a preset.

SME Moves

128-bit 64-bit 32-bit 16-bit 8-bit
zip
unzip
unpack
move to/from tile
svwrite_za16[_{s,u,f,bf}16]_vg1x2
svwrite_za32[_{s,u,f}32]_vg1x2
svwrite_za64[_{s,u,f}64]_vg1x2
svwrite_za8[_{s,u,mf}8]_vg1x2
svwrite_za16[_{s,u,f,bf}16]_vg1x4
svwrite_za32[_{s,u,f}32]_vg1x4
svwrite_za64[_{s,u,f}64]_vg1x4
svwrite_za8[_{s,u,mf}8]_vg1x4
svread_za16_{s,u,f,bf}16_vg1x2
svread_za32_{s,u,f}32_vg1x2
svread_za64_{s,u,f}64_vg1x2
svread_za8_{s,u,mf}8_vg1x2
svread_za16_{s,u,f,bf}16_vg1x4
svread_za32_{s,u,f}32_vg1x4
svread_za64_{s,u,f}64_vg1x4
svread_za8_{s,u,mf}8_vg1x4
move from tile and zero
svread_za16_{s,u,f,bf}16_vg1x2
svread_za32_{s,u,f}32_vg1x2
svread_za64_{s,u,f}64_vg1x2
svread_za8_{s,u,mf}8_vg1x2
svread_za16_{s,u,f,bf}16_vg1x4
svread_za32_{s,u,f}32_vg1x4
svread_za64_{s,u,f}64_vg1x4
svread_za8_{s,u,mf}8_vg1x4
zero vector groups
zero tile

SME Load Operations

128-bit 64-bit 32-bit 16-bit 8-bit
load table register
load ZA row (unpredicated)
load tile slice
load strided registers

SME Store Operations

128-bit 64-bit 32-bit 16-bit 8-bit
store table register
store ZA row (unpredicated)
store tile slice
store strided registers

SME Vector Conversions

Integer Floating-Point
64-bit 32-bit 16-bit 8-bit double single half BFloat16
int to float
float to int
float to float
int to int

SME Vector Arithmetic

Integer Floating-Point
64-bit 32-bit 16-bit 8-bit double single half BFloat16
add
clamp
max
min
round
select
mulh
multiply
scale

SME Vector Shifts

64-bit 32-bit 16-bit 8-bit
shift right
shift left

SME Table Operations

32-bit 16-bit 8-bit
table lookup
(2-bit indices)
table lookup
(4-bit indices)
table lookup
(6-bit indices)
move/zero table register

SME Full Tile Operations

Integer Floating-Point
32-bit 16-bit 8-bit double single half BFloat16 FP8
outer product and accumulate
outer product and subtract
quarter-tile outer product and accumulate
quarter-tile outer product and subtract
sparse outer product

SME Tile Operations

Integer Floating-Point
64-bit 32-bit 16-bit 8-bit double single half BFloat16 FP8
add
subtract
multiply-add
multiply-subtract
multiply-add long
multiply-subtract long
multiply-add long long
multiply-subtract long long
dot product
vertical dot product

Scalar Operations

Add multiple of streaming SVE mode predicate length in bytes
Add multiple of streaming SVE mode vector length in bytes
Get multiple of streaming SVE mode vector length in bytes

Created by Dougall Johnson, 2023-2026, with LLM assistance since 2026.
Arm is a registered trademark of Arm Limited (or its subsidiaries) in some places.