Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

LDR (literal, D)

Test 1: uops

Code:

  ldr d0, .+4

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 0.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch ldst uop (58)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)? int output thing (e9)? ldst retires (ed)
10049851001110001000101381000100011000
10047151001110001000107141000100011000
10046991001110001000105881000100011000
10047061001110001000107341000100011000
10047001001110001000107601000100011000
10047151001110001000108401000100011000
10047011001110001000106251000100011000
10046981001110001000107611000100011000
10046961001110001000107881000100011000
10046921001110001000106981000100011000

Test 2: throughput

Count: 8

Code:

  ldr d0, .+4
  ldr d0, .+4
  ldr d0, .+4
  ldr d0, .+4
  ldr d0, .+4
  ldr d0, .+4
  ldr d0, .+4
  ldr d0, .+4

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.5023

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)? int output thing (e9)? ldst retires (ed)? int retires (ef)
80204404628010510180004100801553003515398025520080186200180000100
80204401688010710180006100800083002471118010820080012200180000100
80204402688010110180000100800103006411538011020080014200180000100
80204401578010110180000100800593003559968015920080072200180000100
80204401538010110180000100800083006428288010820080012200180000100
80204401958010110180000100800083006426408010820080012200180000100
80204401378010110180000100800083006436668010820080012200180000100
80204401948010110180000100800083006426048010820080012200180000100
80204402618013110180030100800083006418308010820080012200180000100
80204401498010110180000100800083006426948010820080012200180000100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.5202

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)? int output thing (e9)? ldst retires (ed)? int retires (ef)
80024436408001111800001080000306655568001020800002018000010
80024417258001111800001080000306673418001020800002018000010
80024416228001111800001080000306675768001020800002018000010
80025417268004111800301080000306671658001020800002018000010
80024416088001111800001080000306672398001020800002018000010
80024416048001111800001080000306675488001020800002018000010
80024416208001111800001080000306671088001020800002018000010
80024416058001111800001080000306679278001020800002018000010
80024416078001111800001080000306677488001020800002018000010
80024415918001111800001080000306676358001020800002018000010