This is inspired by and based on the x86/x64 SIMD Instruction List by Daytime.
This is not an official reference, and may contain mistakes. It is intended to make it easier to find instructions, and to provide an alternative perspective. While writing SME code, please refer to the Arm® Exploration Tools, or Arm® ARM with SME Supplement.
Merging and zeroing predication is typically omitted from the diagrams, but it is shown in operations like BRKN and LD1RQB that use the /Z syntax but have unusual semantics.
This is an ongoing project - dark-red links are missing full descriptions, bright-red links are also missing diagrams and instead link to the documentation in the exploration tools.
Report mistakes or send feedback.
Note: this does not support filtering by vector length, so some unavailable operations may appear available even after selecting a preset.
Warning: this allows contradictory and invalid configurations.
| SME Version | Enabled | Extensions | Presets | 
|---|---|---|---|
|  |  |  | 
| 128-bit | 64-bit | 32-bit | 16-bit | 8-bit | |
|---|---|---|---|---|---|
| zip | |||||
| unzip | |||||
| unpack | |||||
| move to/from tile | |||||
| move from tile and zero | |||||
| zero vector groups | |||||
| zero tile | |||||
| 128-bit | 64-bit | 32-bit | 16-bit | 8-bit | |
|---|---|---|---|---|---|
| load table register | |||||
| load ZA row (unpredicated) | |||||
| load tile slice | |||||
| load strided registers | |||||
| 128-bit | 64-bit | 32-bit | 16-bit | 8-bit | |
|---|---|---|---|---|---|
| store table register | |||||
| store ZA row (unpredicated) | |||||
| store tile slice | |||||
| store strided registers | |||||
| Integer | Floating-Point | |||||||
|---|---|---|---|---|---|---|---|---|
| 64-bit | 32-bit | 16-bit | 8-bit | double | single | half | BFloat16 | |
| int to float | ||||||||
| float to int | ||||||||
| float to float | ||||||||
| int to int | ||||||||
| Integer | Floating-Point | |||||||
|---|---|---|---|---|---|---|---|---|
| 64-bit | 32-bit | 16-bit | 8-bit | double | single | half | BFloat16 | |
| add | ||||||||
| clamp | ||||||||
| max | ||||||||
| min | ||||||||
| round | ||||||||
| select | ||||||||
| mulh | ||||||||
| 64-bit | 32-bit | 16-bit | 8-bit | |
|---|---|---|---|---|
| shift right | ||||
| shift left | 
| 32-bit | 16-bit | 8-bit | |
|---|---|---|---|
| table lookup (2-bit indices) | |||
| table lookup (4-bit indices) | |||
| zero table register | |||
| Integer | Floating-Point | ||||||
|---|---|---|---|---|---|---|---|
| 32-bit | 16-bit | 8-bit | double | single | half | BFloat16 | |
| outer product and accumulate | |||||||
| outer product and subtract | |||||||
| Integer | Floating-Point | |||||||
|---|---|---|---|---|---|---|---|---|
| 64-bit | 32-bit | 16-bit | 8-bit | double | single | half | BFloat16 | |
| add | ||||||||
| subtract | ||||||||
| multiply-add | ||||||||
| multiply-subtract | ||||||||
| multiply-add long | ||||||||
| multiply-subtract long | ||||||||
| multiply-add long long | ||||||||
| multiply-subtract long long | ||||||||
| dot product | ||||||||
| vertical dot product | ||||||||
| Add multiple of streaming SVE mode predicate length in bytes | |
| Add multiple of streaming SVE mode vector length in bytes | |
| Get multiple of streaming SVE mode vector length in bytes |