Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

STR (post-index, 64-bit)

Test 1: uops

Code:

  str x0, [x6], #8

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 2.000

Integer unit issues: 1.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)
100514112059104110181040100046411832520001000200010011000
100410742001100110001000100046411747920001000200010011000
100410742001100110001000100046411749720001000200010011000
100410772001100110001000100046411756920001000200010011000
100411072001100110001000100046411762320001000200010011000
100410782001100110001000100046411756920001000200010011000
100410772001100110001000100046411821720001000200010011000
100410802001100110001000100046411764120001000200010011000
100410772001100110001000100046411769520001000200010011000
100410762001100110001000100046291893720001000200010011000

Test 2: Latency 2->2

Code:

  str x0, [x6], #8

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 1.0089

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
102091120620400103101009010310100031054641711632010920010010200200201000110000100
10204100882010410104100001010410002435081708812010620010008200200161000410000100
10204100902010410104100001010410002435101708092010620010008200200161000410000100
10204100912010410104100001010410002435101709352010620010008200200161000310000100
10204100882010410104100001010410002435091708632010620010008200200161000310000100
10204100882010410104100001010410002435111708992010620010008200200161000410000100
10204100862010410104100001010410002435101708272010620010008200200161000410000100
10204100882010410104100001010410002435111708452010620010008200200161000410000100
10204100992010410104100001010410002435161715112010620010008200200161000410000100
10204100892010410104100001010410002435101708092010620010008200200161000410000100

1000 unrolls and 10 iterations

Result (median cycles for code): 1.0094

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
10030120062035310246101071025010002608201709042001620100082020000100011000010
10025102252006410047100171005210000605221710782001020100002020000100011000010
10024101052001110011100001001010000430711711992001020100002020000100011000010
10024101032001110011100001001010000430691708392001020100002020000100011000010
10024101302001110011100001001010000430781713372001020100002020000100011000010
10024101112001110011100001001010000430741713072001020100002020000100011000010
10024100982001110011100001001010000430751709412001020100002020000100011000010
10024101082001110011100001001010000430711709292001020100002020000100011000010
10024100942001110011100001001010000430711708752001020100002020000100011000010
10024100942001110011100001001010000430711709292001020100002020000100011000010

Test 3: throughput

Count: 8

Code:

  str x0, [x6], #8
  str x0, [x7], #8
  str x0, [x8], #8
  str x0, [x9], #8
  str x0, [x10], #8
  str x0, [x11], #8
  str x0, [x12], #8
  str x0, [x13], #8
  mov x7, x6
  mov x8, x6
  mov x9, x6
  mov x10, x6
  mov x11, x6
  mov x12, x6
  mov x13, x6

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 1.0006

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
8020981115160401803118009080311800022403121360162160106200800082001600168000580000100
8020480056160105801058000080104800022403121360051160106200800082001600168000580000100
8020480048160105801058000080104800022403121360051160106200800082001600168000580000100
8020480048160105801058000080104800022403121360051160106200800082001600168000580000100
8020480048160105801058000080104800022403121360051160106200800082001600168000580000100
8020480048160105801058000080104800022403121360051160106200800082001600168000580000100
8020480048160105801058000080104800022403121360051160106200800082001600168000580000100
8020480048160105801058000080104800022403121360051160106200800082001600168000580000100
8020480048160105801058000080104800022403121360051160106200800082001600168000580000100
8020580107160154801378001780140800022403121360211160106200800082001600168000580000100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 1.0006

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
8002981493160305802158009080214800022400421360051160016208000820160176800858000010
8002480048160015800158000080014800002400301360045160010208000020160000800018000010
8002480048160011800118000080010800002400301360045160010208000020160000800018000010
8002480048160011800118000080010800002400301360045160010208000020160000800018000010
8002480048160011800118000080010800002400301360045160010208000020160000800018000010
8002480048160011800118000080010800002400301360045160010208000020160000800018000010
8002480048160011800118000080010800002400301360171160010208000020160000800018000010
8002480048160011800118000080010800002400301360045160010208000020160000800018000010
8002480048160011800118000080010800002400301360045160010208000020160000800018000010
8002480048160011800118000080010800352401491360647160085208004820160000800018000010