SVE | SME (WIP, no diagrams)

A64 SIMD Instruction List: SME Instructions

This is inspired by and based on the x86/x64 SIMD Instruction List by Daytime.

This is not an official reference, and may contain mistakes. It is intended to make it easier to find instructions, and to provide an alternative perspective. While writing SME code, please refer to the Arm® Exploration Tools, or Arm® ARM with SME Supplement.

Merging and zeroing predication is typically omitted from the diagrams, but it is shown in operations like BRKN and LD1RQB that use the /Z syntax but have unusual semantics.

This is an ongoing project - dark-red links are missing full descriptions, bright-red links are also missing diagrams and instead link to the documentation in the exploration tools.

Report mistakes or send feedback.

Target

Note: this does not support filtering by vector length, so some unavailable operations may appear available even after selecting a preset.

Warning: this allows contradictory and invalid configurations.

SME Version Enabled Extensions Presets




SME Moves

128-bit 64-bit 32-bit 16-bit 8-bit
zip
unzip
unpack
move to/from tile
move from tile and zero
zero vector groups
zero tile

SME Load Operations

128-bit 64-bit 32-bit 16-bit 8-bit
load table register
load ZA row (unpredicated)
load tile slice
load strided registers

SME Store Operations

128-bit 64-bit 32-bit 16-bit 8-bit
store table register
store ZA row (unpredicated)
store tile slice
store strided registers

SME Vector Conversions

Integer Floating-Point
64-bit 32-bit 16-bit 8-bit double single half BFloat16
int to float
float to int
float to float
int to int

SME Vector Arithmetic

Integer Floating-Point
64-bit 32-bit 16-bit 8-bit double single half BFloat16
add
clamp
max
min
round
select
mulh

SME Vector Shifts

64-bit 32-bit 16-bit 8-bit
shift right
shift left

SME Table Operations

32-bit 16-bit 8-bit
table lookup
(2-bit indices)
table lookup
(4-bit indices)
zero table register

SME Full Tile Operations

Integer Floating-Point
32-bit 16-bit 8-bit double single half BFloat16
outer product and accumulate
outer product and subtract

SME Tile Operations

Integer Floating-Point
64-bit 32-bit 16-bit 8-bit double single half BFloat16
add
subtract
multiply-add
multiply-subtract
multiply-add long
multiply-subtract long
multiply-add long long
multiply-subtract long long
dot product
vertical dot product

Scalar Operations

Add multiple of streaming SVE mode predicate length in bytes
Add multiple of streaming SVE mode vector length in bytes
Get multiple of streaming SVE mode vector length in bytes

Created by Dougall Johnson, 2023.
Arm is a registered trademark of Arm Limited (or its subsidiaries) in some places.