Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

LDR (literal, 64-bit)

Test 1: uops

Code:

  ldr x0, .+4

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 0.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch ldst uop (58)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)? int output thing (e9)? ldst retires (ed)
10049451001110001000110971000100011000
10047151001110001000110221000100011000
10047191001110001000111531000100011000
10047171001110001000109611000100011000
10047021001110001000111091000100011000
10047251001110001000109711000100011000
10047251001110001000109271000100011000
10047121001110001000111521000100011000
10047211001110001000111341000100011000
10047171001110001000112071000100011000

Test 2: throughput

Count: 8

Code:

  ldr x0, .+4
  ldr x0, .+4
  ldr x0, .+4
  ldr x0, .+4
  ldr x0, .+4
  ldr x0, .+4
  ldr x0, .+4
  ldr x0, .+4

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.5022

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)? int output thing (e9)? ldst retires (ed)? int retires (ef)
80204403208010110180000100800083006423568010820080012200180000100
80204402328010710180006100800083006430228010820080012200180000100
80204401698010110180000100800103004862668011020080014200180000100
80204401588010110180000100800083006423148010820080012200180000100
80204401578010110180000100800083036428968010820080012200180000100
80204402068010110180000100800083006422248010820080012200180000100
80204401688010110180000100800083006421588010820080012200180000100
80204401528010110180000100800083006422608010820080012200180000100
80204401648010110180000100800083006421588010820080012200180000100
80204401648010110180000100800083006419908010820080012200180000100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.5201

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)? int output thing (e9)? ldst retires (ed)? int retires (ef)
80024437048001111800001080000306570468001020800002018000010
80024416748001111800001080000306675118001020800002018000010
80024415998001111800001080000306678558001020800002018000010
80024416068001111800001080000306675928001020800002018000010
80024415858001111800001080000306675198001020800002018000010
80024416178001111800001080000306678758001020800002018000010
80024416118001111800001080000306679298001020800002018000010
80024415808001111800001080000306671958001020800002018000010
80024416098001111800001080000306674478001020800002018000010
80024416178001111800001080000306672828001020800002018000010