Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

LDR (literal, S)

Test 1: uops

Code:

  ldr s0, .+4

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 0.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch ldst uop (58)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)? int output thing (e9)? ldst retires (ed)
10049181001110001000108051000100011000
10047021001110001000109551000100011000
10046921001110001000108451000100011000
10046941001110001000107181000100011000
10046961001110001000107451000100011000
10046941001110001000107471000100011000
10046881001110001000107361000100011000
10046951001110001000107831000100011000
10046971001110001000106571000100011000
10046891001110001000107091000100011000

Test 2: throughput

Count: 8

Code:

  ldr s0, .+4
  ldr s0, .+4
  ldr s0, .+4
  ldr s0, .+4
  ldr s0, .+4
  ldr s0, .+4
  ldr s0, .+4
  ldr s0, .+4

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.5024

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)? int output thing (e9)? ldst retires (ed)? int retires (ef)
80204403918010110180000100800113006430308011120080015200180000100
80204402248010710180006100800083006419548010820080012200180000100
80204401918010110180000100800093006417898010920080012200180000100
80204401968010110180000100800083006416668010820080012200180000100
80204401918010110180000100800083006417388010820080012200180000100
80204402128010110180000100800093006426358010920080012200180000100
80204402008010110180000100800093006417178010920080012200180000100
80204401938010110180000100800083006418108010820080012200180000100
80204401488010110180000100800593006428528015920080072200180000100
80204401958010110180000100800083006418468010820080012200180000100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.5158

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)? int output thing (e9)? ldst retires (ed)? int retires (ef)
80024433368001711800061080000302924108001020800002018000010
80024413258001111800001080192373765368020422802262018000010
80024412668001111800001080000306613108001020800002018000010
80024412678001111800001080048306049458005820800572018000010
80024412668001111800001080000306613698001020800002018000010
80024412598001111800001080000306612568001020800002018000010
80024412518001111800001080000306613608001020800002018000010
80024412518001111800001080000306609748001020800002018000010
80024412588001111800001080048306046648005820800572018000010
80024414558007111800601080000306611128001020800002018000010