Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

STR (pre-index, 32-bit)

Test 1: uops

Code:

  str w0, [x6, #8]!

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 2.000

Integer unit issues: 1.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)
100513512059104110181040100046451774920001000200010011000
100410772001100110001000100046491769520001000200010011000
100410782001100110001000100046491818120001000200010011000
100410782001100110001000100046491859520001000200010011000
100410692001100110001000100046491765920001000200010011000
100411162001100110001000100046531836120001000200010011000
100410822001100110001000100046491774920001000200010011000
100410842001100110001000100046491774920001000200010011000
100410992001100110001000100046491762320001000200010011000
100410772001100110001000100046491764120001000200010011000

Test 2: Latency 2->2

Code:

  str w0, [x6, #8]!

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 1.0139

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
10209112382040410314100901031410003608161713812010920010010200200201000710000100
10204101092010510105100001010610001435461714182010520010008200200161000410000100
10204101192010410104100001010410002435511711152010620010008200200161000410000100
10204101212010410104100001010410002435501713132010620010008200200161000410000100
10204101092010410104100001010410002435501712592010620010008200200161000410000100
10204104422010110101100001010010003514041712552010920010010200200161000410000100
10204100872010410104100001010410001435091715982010520010008200200161000310000100
10204101182010310103100001010410001435201718682010520010008200200161000410000100
102041016720105101051000010106100031045901716332010920010010200200201000110000100
10204101382010410104100001010410002435141717272010620010008200200161000410000100

1000 unrolls and 10 iterations

Result (median cycles for code): 1.0148

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
10029112842030710217100901021810002430121723212001620100082020000100011000010
10024101552001110011100001001010000430711719012001020100002020000100011000010
10024101482001110011100001001010000429951718472001020100002020000100011000010
10024101462001110011100001001010000429951718472001020100002020000100011000010
10024101452001110011100001001010000430711719012001020100002020000100011000010
10024101452001110011100001001010000429951718472001020100002020000100011000010
10024101482001110011100001001010000429951718472001020100002020000100011000010
10024101492001110011100001001010000430711719912001020100002020000100011000010
10024101482001110011100001001010000430711719012001020100002020000100011000010
10024101422001110011100001001010000430711719012001020100002020000100011000010

Test 3: throughput

Count: 8

Code:

  str w0, [x6, #8]!
  str w0, [x7, #8]!
  str w0, [x8, #8]!
  str w0, [x9, #8]!
  str w0, [x10, #8]!
  str w0, [x11, #8]!
  str w0, [x12, #8]!
  str w0, [x13, #8]!
  mov x7, x6
  mov x8, x6
  mov x9, x6
  mov x10, x6
  mov x11, x6
  mov x12, x6
  mov x13, x6

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 1.0007

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
8020980968160401803118009080311800352404191360781160175200800482001600168000580000100
8020480056160105801058000080104800022403121360211160106200800082001600168000580000100
8020480056160105801058000080104800022403121360211160106200800082001600168000580000100
8020480056160105801058000080104800022403121360211160106200800082001600168000580000100
8020480056160105801058000080104800022403121360211160106200800082001600168000580000100
8020480056160105801058000080104800022403121360211160106200800082001600168000580000100
8020480056160105801058000080104800022403121360211160106200800082001600168000580000100
8020480056160105801058000080104800022403121360211160106200800082001600168000580000100
8020480056160105801058000080104800022403121360211160106200800082001600168000580000100
8020480056160105801058000080104800022403121360211160106200800082001600168000580000100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 1.0011

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
8002980951160305802158009080214800022400421360157160016208000820160096800378000010
8002480053160011800118000080010800002400301359991160010208000020160000800018000010
8002480045160011800118000080010800002400301359991160010208000020160000800018000010
8002480045160011800118000080010800002400301359991160010208000020160000800018000010
8002480045160011800118000080010800002400301359991160010208000020160000800018000010
8002480045160011800118000080010800002400301359991160010208000020160000800018000010
8002480045160011800118000080010800002400301359991160010208000020160000800018000010
8002480045160011800118000080010800002400301359991160010208000020160096800378000010
8002480045160011800118000080010800002400301359991160010208000020160000800018000010
8002480045160011800118000080010800002400301359991160010208000020160000800018000010