Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

LDR (literal, 32-bit)

Test 1: uops

Code:

  ldr w0, .+4

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 0.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch ldst uop (58)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)? int output thing (e9)? ldst retires (ed)
10049501001110001000123241000100011000
10047651001110001000121691000100011000
10047531001110001000119011000100011000
10047431001110001000114941000100011000
10047581001110001000119731000100011000
10047591001110001000115301000100011000
10047471001110001000115571000100011000
10047561001110001000117461000100011000
10047561001110001000116111000100011000
10047431001110001000119801000100011000

Test 2: throughput

Count: 8

Code:

  ldr w0, .+4
  ldr w0, .+4
  ldr w0, .+4
  ldr w0, .+4
  ldr w0, .+4
  ldr w0, .+4
  ldr w0, .+4
  ldr w0, .+4

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.5021

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)? int output thing (e9)? ldst retires (ed)? int retires (ef)
80204404308010510180004100800103002555178011020080014200180000100
80204401768010110180000100800103004858078011020080014200180000100
80204401458010110180000100800083006421588010820080012200180000100
80204401638010110180000100800083006421048010820080012200180000100
80204401668010110180000100800083006421768010820080012200180000100
80205401978013810180037100800093006428578010920080012200180000100
80204404598010110180000100800573005689778015720080070200180000100
80204401718010110180000100800093006423178010920080012200180000100
80204401638010110180000100800093006422278010920080012200180000100
80204401568010110180000100800093006422818010920080012200180000100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.5162

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)? int output thing (e9)? ldst retires (ed)? int retires (ef)
80024435578001511800041080010303808848002020800142018000010
80024413238001111800001080000306608498001020800002018000010
80024413468001111800001080000306617398001020800002018000010
80024412768001111800001080000306619298001020800002018000010
80024412948001111800001080000306616978001020800002018000010
80024412958001111800001080000306618568001020800002018000010
80024412918001111800001080000306623368001020800002018000010
80025414608004311800321080000306617068001020800002018000010
80024412858001111800001080000306615118001020800002018000010
80024412918001111800001080000306620528001020800002018000010