Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

STR (pre-index, 64-bit)

Test 1: uops

Code:

  str x0, [x6, #8]!

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 2.000

Integer unit issues: 1.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)
100514012059104110181040100046491774920001000200010011000
100410782001100110001000100047611755120001000200010011000
100410722001100110001000100047611751520001000200010011000
100410742001100110001000100047611749720001000200010011000
100410732001100110001000100047611755120001000200010011000
100410902001100110001000100047491747920001000200010011000
100410732001100110001000100047611753320001000200010011000
100410712001100110001000100047611783920001000200010011000
100410712001100110001000100047611755120001000200010011000
100411002001100110001000100047611751520001000200010011000

Test 2: Latency 2->2

Code:

  str x0, [x6, #8]!

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 1.0091

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
102091125420404103141009010314100031056561708762010920010010200200201000510000100
10204100962010410104100001010410002436281708812010620010008200200161000410000100
10204100912010410104100001010410002436281708812010620010008200200161000410000100
10204100912010410104100001010410002436281708812010620010008200200161000410000100
10204100912010410104100001010410002436281708812010620010008200200161000410000100
10204100912010410104100001010410002436281708812010620010008200200161000410000100
10204100912010410104100001010410002436281708812010620010008200200161000410000100
10204100912010410104100001010410002436281708812010620010008200200161000410000100
10204100912010410104100001010410002436281708812010620010008200200161000410000100
10204100912010410104100001010410002436281708812010620010008200200161000410000100

1000 unrolls and 10 iterations

Result (median cycles for code): 1.0087

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
100291126020306102161009010217100021012651708322001620100082020020100011000010
10024100912001110011100001001010000430791707852001020100002020000100011000010
10024100862001110011100001001010000429671708752001020100002020000100011000010
10024100862001110011100001001010000430751711632001020100002020000100011000010
10024101152001110011100001001010000429381711812001020100002020000100011000010
10024101122001110011100001001010000430791709652001020100002020000100011000010
10024101212001110011100001001010000430791707852001020100002020000100011000010
10024100862001110011100001001010000430791708032001020100002020000100011000010
10024100862001110011100001001010000430791708032001020100002020000100011000010
10024100872001110011100001001010000430791708032001020100002020000100011000010

Test 3: throughput

Count: 8

Code:

  str x0, [x6, #8]!
  str x0, [x7, #8]!
  str x0, [x8, #8]!
  str x0, [x9, #8]!
  str x0, [x10, #8]!
  str x0, [x11, #8]!
  str x0, [x12, #8]!
  str x0, [x13, #8]!
  mov x7, x6
  mov x8, x6
  mov x9, x6
  mov x10, x6
  mov x11, x6
  mov x12, x6
  mov x13, x6

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 1.0011

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
8020980981160401803118009080311800022403121360108160106200800082001600168000580000100
8020480053160105801058000080104800022403121359997160106200800082001600168000580000100
8020480045160105801058000080104800022403121359997160106200800082001600168000580000100
8020580115160154801378001780140800022403121360157160106200800082001600168000580000100
8020480045160105801058000080104800022403121359997160106200800082001600168000580000100
8020480045160105801058000080104800022403121359997160106200800082001600168000580000100
8020480045160105801058000080104800022403121359997160106200800082001600168000580000100
8020480045160105801058000080104800362404251360597160178200800502001600168000580000100
8020480045160105801058000080104800022403121359997160106200800082001600168000580000100
8020480045160105801058000080104800022403121360069160106200800082001600168000580000100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 1.0007

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
8002980970160305802158009080214800022400421360211160016208000820160000800018000010
8002480056160011800118000080010800002400301360205160010208000020160000800018000010
8002480056160011800118000080010800002400301360205160010208000020160000800018000010
8002480056160011800118000080010800002400301360205160010208000020160000800018000010
8002480056160011800118000080010800352401491360807160085208004820160000800018000010
8002480056160011800118000080010800002400301360205160010208000020160000800018000010
8002480056160011800118000080010800002400301360205160010208000020160000800018000010
8002480056160011800118000080010800002400301360205160010208000020160000800018000010
8002480056160011800118000080010800002400301360205160010208000020160000800018000010
8002480056160011800118000080010800002400301360205160010208000020160000800018000010