Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

STP (pre-index, 64-bit)

Test 1: uops

Code:

  stp x0, x1, [x6, #8]!

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 2.000

Integer unit issues: 1.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)
100519292059104110181040100046651853920001000300010011000
100411042001100110001000100046651777520001000300010011000
100410952001100110001000100046651780320001000300010011000
100410962001100110001000100046651830520001000300010011000
100410972001100110001000100046691845120001000300010011000
100411222001100110001000100046651789420001000300010011000
100411292001100110001000100046651879820001000300010011000
100411592001100110001000100046651880320001000300010011000
100410962001100110001000100046651819520001000300010011000
100411472001100110001000100046651774920001000300010011000

Test 2: Latency 3->3

Code:

  stp x0, x1, [x6, #8]!

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 1.0620

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
10209120222039710307100901030710002446751802262010820010010200300301000510000100
10204106082010410104100001010410002435461803912010620010008200300241000410000100
10204106082010410104100001010410002435491804812010620010008200300241000410000100
10204106062010410104100001010410002435421803912010620010008200300241000410000100
10204106202010410104100001010410002435441801932010620010008200300241000410000100
10204106122010410104100001010410002435421802832010620010008200300241000410000100
10204106182010410104100001010410002435411806432010620010008200300241000410000100
10204106272010410104100001010410002435491802112010620010008200300241000410000100
10204106332010410104100001010410002435421805892010620010008200300241000410000100
10204106322010410104100001010410002435441803192010620010008200300241000410000100

1000 unrolls and 10 iterations

Result (median cycles for code): 1.0605

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
10029118192031110221100901022210002429851799412001620100082030000100011000010
10024105962001110011100001001010000429701802952001020100002030000100011000010
10024106172001110011100001001010000429711800792001020100002030000100011000010
10024106042001110011100001001010000429751804212001020100002030144100361000010
10024106242001110011100001001010000429751796832001020100002030024100041000010
10024105922001110011100001001010000429261794612001020100002030000100011000010
10024105932001110011100001001010000429281803792001020100002030000100011000010
10024106272001110011100001001010000429511800552001020100002030000100011000010
10024106052001110011100001001010000429511803972001020100002030000100011000010
10024106142001110011100001001010000429491803792001020100002030000100011000010

Test 3: throughput

Count: 8

Code:

  stp x0, x1, [x6, #8]!
  stp x0, x1, [x7, #8]!
  stp x0, x1, [x8, #8]!
  stp x0, x1, [x9, #8]!
  stp x0, x1, [x10, #8]!
  stp x0, x1, [x11, #8]!
  stp x0, x1, [x12, #8]!
  stp x0, x1, [x13, #8]!
  mov x7, x6
  mov x8, x6
  mov x9, x6
  mov x10, x6
  mov x11, x6
  mov x12, x6
  mov x13, x6

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 1.0135

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
8020982006160398803088009080308800022403121378612160106200800082002401208003380000100
8020481083160106801068000080106800022403121378501160106200800082002400248000580000100
8020481083160105801058000080104800022403121378661160106200800082002400248000580000100
8020481081160105801058000080104800022403121378591160106200800082002400248000580000100
8020481073160105801058000080104800022403121378697160106200800082002400248000580000100
8020481073160105801058000080104800022403121378661160106200800082002400248000580000100
8020481083160105801058000080104800022403121378519160106200800082002400248000580000100
8068285528160625804428018380396800022403121378841160106200800082002400248000580000100
8020481072160105801058000080104800022403121378483160106200800082002401448003780000100
8020481104160105801058000080104800022403121378501160106200800082002400248000580000100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 1.0137

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
8002981926160305802158009080214800022400421378985160016208000820240000800018000010
8002481099160011800118000080010800002400301378979160010208000020240000800018000010
8002581200160060800438001780046800002400301378979160010208000020240000800018000010
8002481099160011800118000080010800002400301378979160010208000020240000800018000010
8002481099160011800118000080010800002400301378979160010208000020240000800018000010
8002481099160011800118000080010800002400301378979160010208000020240000800018000010
8002481099160011800118000080010800002400301378979160010208000020240000800018000010
8002481102160011800118000080010800002400301379051160010208000020240000800018000010
8002481099160011800118000080010800002400301378979160010208000020240120800338000010
8002481099160011800118000080010800002400301378979160010208000020240000800018000010