Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

STR (post-index, 32-bit)

Test 1: uops

Code:

  str w0, [x6], #8

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 2.000

Integer unit issues: 1.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)
100512692059104110181040100046411805520001000200010011000
100410742001100110001000100046411749720001000200010011000
100411032001100110001000100046371758720001000200010011000
100410742001100110001000100046411755120001000200010011000
100411062001100110001000100046211771320001000200010011000
100411012001100110001000100046411823520001000200010011000
100410742001100110001000100046371765920001000200010011000
100410782001100110001000100046411765920001000200010011000
100411162001100110001000100046411764120001000200010011000
100410652001100110001000100046411758720001000200010011000

Test 2: Latency 2->2

Code:

  str w0, [x6], #8

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 1.0232

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
10209111972040010310100901031010003578741719342010920010010200200201000510000100
10204101502010410104100001010410002435541719252010620010008200200161000410000100
10204101492010410104100001010410002435541719252010620010008200200161000410000100
10204101492010410104100001010410002435541719252010620010008200200161000410000100
10204101492010410104100001010410036461581731812017920210049200200161000410000100
10204101492010410104100001010410002435541719252010620010008200200161000410000100
10204101492010410104100001010410002435411721772010620010008200200161000410000100
10204101632010410104100001010410002435491723972010620010008200200161000410000100
10204101642010410104100001010410002435591725412010620010008200200161000410000100
10204103102010310103100001010410003559061744722010920010010200200201000510000100

1000 unrolls and 10 iterations

Result (median cycles for code): 1.0104

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
10029111582030410214100901021410038752781728272009220100482020000100011000010
10024100872001110011100001001010036955611721912008620100402020000100011000010
10025101782006310046100171005010072996051741012016220100802020000100011000010
10024101022001110011100001001010000429581708932001020100002020000100011000010
10024101022001110011100001001010000429581708932001020100002020000100011000010
10024101022001110011100001001010000429581709112001020100002020000100011000010
10024100922001110011100001001010000429581710732001020100002020000100011000010
10024101022001110011100001001010000429581710732001020100002020000100011000010
10024100922001110011100001001010000429581709112001020100002020000100011000010
10024100922001110011100001001010000429581708932001020100002020000100011000010

Test 3: throughput

Count: 8

Code:

  str w0, [x6], #8
  str w0, [x7], #8
  str w0, [x8], #8
  str w0, [x9], #8
  str w0, [x10], #8
  str w0, [x11], #8
  str w0, [x12], #8
  str w0, [x13], #8
  mov x7, x6
  mov x8, x6
  mov x9, x6
  mov x10, x6
  mov x11, x6
  mov x12, x6
  mov x13, x6

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 1.0006

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
8020980839160401803118009080311800022403121360157160106200800082001600968004580000100
8020480053160105801058000080104800742405521364700160258200800882001601768008580000100
8020480045160105801058000080104800022403121359997160106200800082001600968004580000100
8020480140160163801458001880144800022403121359997160106200800082001600168000580000100
8020480045160105801058000080104800022403121359997160106200800082001600168000580000100
8020480045160105801058000080104800022403121359997160106200800082001600168000580000100
8020480914160453803458010880344801102406721364371160334200801282001600968004580000100
8020480045160105801058000080104800752405581362557160261200800902001600968004580000100
8020480045160105801058000080104800022403121359997160106200800082001600168000580000100
8020480045160105801058000080104800022403121360141160106200800082001600968004580000100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 1.0006

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
8002981010160305802158009080214800382401621361784160092208004820160000800018000010
8002480045160011800118000080010800362401551360669160088208005020160080800418000010
8002480045160011800118000080010800002400301359991160010208000020160000800018000010
8002480045160011800118000080010800002400301360027160010208000020160000800018000010
8002480045160011800118000080010800362401501362187160086208004020160000800018000010
8002480045160011800118000080010800362401501361071160086208004020160336801578000010
8002480045160011800118000080010800002400301360063160010208000020160000800018000010
8002480048160011800118000080010800002400301359991160010208000020160160800818000010
8002480045160011800118000080010800352401491360683160085208004820160160800818000010
8002480045160011800118000080010800002400301359991160010208000020160000800018000010