Apple M1 Microarchitecture Research by Dougall Johnson Firestorm: Overview | Base Instructions | SIMD and FP Instructions Icestorm: Overview | Base Instructions | SIMD and FP Instructions
This is an early attempt at microarchitecture documentation for the CPU in the Apple M1, inspired by and building on the amazing work of Andreas Abel, Andrei Frumusanu, @Veedrac, Travis Downs, Henry Wong and Agner Fog. This documentation is my best effort, but it is based on black-box reverse engineering, and there are definitely mistakes. No warranty of any kind (and not just as a legal technicality). To make it easier to verify the information and/or identify such errors, entries in the instruction tables link to the experiments and results (~35k tables of counter values).
Icestorm is the high-efficiency microarchitecture used by the four E-cores in the M1. Low-power ARM cores are generally a bit less novel, so the notes here are a bit less thorough.
These are refered to as "units", to try to avoid confusion if Apple releases official documentation, as they probably refer to them as "ports" or "pipes", and order them differently. (If this just causes more confusion, I apologise.)
Integer units: 1: alu + br + mrs 2: alu + br + div + ptrauth 3: alu + mul + bfm + crc Load and store units (up to 128-bit loads and stores, including address generation with shifts up to LSL #3): 4: load/store + amx 5: load FP/SIMD units: 6: fp/simd 7: fp/simd + fdiv + to-int + div + recp + sqrt + sha + jcvtzs
movk (pair only, but any shift on both) and
adrp as well as the Firestorm things.
Mostly the same as Firestorm. Icestorm also has
movk elimination, but still
add fusion (although it is one uop on account of
Several instructions have latencies that aren't adequately described in the instruction tables:
These numbers mostly come from my M1 buffer size measuring tool. The M1 seems to use something other than an entirely conventional reorder buffer, which complicates measurements a bit. So these may or may not be accurate. (This paragraph previously said "it seems to use something along the lines of a validation buffer". I think the VB hypothesis has since been disproven.)