Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

STR (pre-index, Q)

Test 1: uops

Code:

  str q0, [x6, #0x10]!

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 2.000

Integer unit issues: 1.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)
100514402059104110181040100046971841520001000200010011000
100411082001100110001000100046811906320001000200010011000
100411142001100110001000100046811812720001000200010011000
100411132001100110001000100046931990920001000200010011000
100411622001100110001000100046931929720001000200010011000
100411452001100110001000100046931890120001000200010011000
100411382001100110001000100046931821720001000200010011000
100411212001100110001000100046931810920001000200010011000
100411382001100110001000100046931924320001000200010011000
100411382001100110001000100046931812720001000200010011000

Test 2: Latency 3->3

Code:

  str q0, [x6, #0x10]!

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 1.1251

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
10214154842070110521101801052310002442971917622010820010010200200201000310000100
10204112952010410104100001010710001436061921802010520010008200200161000410000100
10204113002010410104100001010410001436151923242010520010008200200161000310000100
10204112552010410104100001010410002435771914732010620010008200200161000310000100
10204112902010310103100001010410000435811928532010020010004200200161000410000100
10204114182010410104100001010410001436111928182010520010008200200161000310000100
10204112702010410104100001010410002436011919952010620010008200200161000410000100
10204112812010310103100001010410002436151913652010620010008200200161000310000100
10204112562010410104100001010410001436151922162010520010008200200161000310000100
10204112292010410104100001010410001436221912982010520010008200200161000310000100

1000 unrolls and 10 iterations

Result (median cycles for code): 1.1251

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
10034159222060910429101801043010003441101906762001920100102020016100031000010
10024113072001310013100001001610001431041922902001520100082020000100011000010
10024112372001110011100001001010000430941916752001020100002020000100011000010
10024111932001110011100001001010000430881920172001020100002020000100011000010
10024112432001110011100001001010000430941923592001020100002020000100011000010
10024112602001110011100001001010000430901910812001020100002020000100011000010
10024112862001110011100001001010000430931921252001020100002020000100011000010
10024112482001110011100001001010000430901914592001020100002020000100011000010
10024112422001110011100001001010000430891912972001020100002020000100011000010
10024112412001110011100001001010000429421919992001020100002020000100011000010

Test 3: throughput

Count: 8

Code:

  str q0, [x6, #0x10]!
  str q0, [x7, #0x10]!
  str q0, [x8, #0x10]!
  str q0, [x9, #0x10]!
  str q0, [x10, #0x10]!
  str q0, [x11, #0x10]!
  str q0, [x12, #0x10]!
  str q0, [x13, #0x10]!
  mov x7, x6
  mov x8, x6
  mov x9, x6
  mov x10, x6
  mov x11, x6
  mov x12, x6
  mov x13, x6

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 1.0010

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
8021482361160692805128018080511800022403121360648160106200800082001600168000580000100
8020480083160105801058000080104800352404191361383160175200800482001600168000580000100
8020480083160105801058000080104800022403121360697160106200800082001600168000580000100
8020480083160105801058000080104800022403121360697160106200800082001600168000580000100
8020480083160105801058000080104800022403121360697160106200800082001600168000580000100
8020480083160105801058000080104800022403121360697160106200800082001600168000580000100
8020480083160105801058000080104800022403121360697160106200800082001600168000580000100
8020480083160105801058000080104800022403121360697160106200800082001600168000580000100
8020480084160105801058000080104800022403121360697160106200800082001600168000580000100
8020480083160105801058000080104800022403121360697160106200800082001600168000580000100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 1.0007

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
8003482060160612804328018080432800022400421360147160016208000820160016800058000010
8002480056160011800118000080010800002400301360205160010208000020160000800018000010
8002480056160011800118000080010800002400301360205160010208000020160000800018000010
8002480056160011800118000080010800002400301360205160010208000020160000800018000010
8002480056160011800118000080010800002400301360205160010208000020160000800018000010
8002480056160011800118000080010800002400301360205160010208000020160000800018000010
8002480056160011800118000080010800352401491360807160085208004820160000800018000010
8002480056160011800118000080010800002400301360205160010208000020160000800018000010
8002480056160011800118000080010800002400301360205160010208000020160000800018000010
8002480056160011800118000080010800002400301360205160010208000020160000800018000010