Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

LDR (literal, Q)

Test 1: uops

Code:

  ldr q0, .+4

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 0.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch ldst uop (58)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)? int output thing (e9)? ldst retires (ed)
10049751001110001000108841000100011000
10047001001110001000106641000100011000
10046851001110001000106461000100011000
10046801001110001000104391000100011000
10046851001110001000106821000100011000
10046851001110001000105741000100011000
10046901001110001000104751000100011000
10046771001110001000105741000100011000
10046821001110001000104391000100011000
10046861001110001000105741000100011000

Test 2: throughput

Count: 8

Code:

  ldr q0, .+4
  ldr q0, .+4
  ldr q0, .+4
  ldr q0, .+4
  ldr q0, .+4
  ldr q0, .+4
  ldr q0, .+4
  ldr q0, .+4

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.5023

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule simd uop (54)schedule ldst uop (55)dispatch int uop (56)dispatch simd uop (57)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
80204404108010110108000010008001030025406680110200800142000180000100
80204401568010110108000010008006030064203280160200800722000180000100
80204401548010810108000710008000830064255280108200800122000180000100
80204401408010110108000010008000830064267880108200800122000180000100
80204401768010110108000010008000830064260680108200800122000180000100
80204401448010110108000010008000830064253480108200800122000180000100
80204401838010110108000010008000830064240880108200800122000180000100
80204401788010110108000010008000830064251680108200800122000180000100
80204401378010110108000010008000830064269680108200800122000180000100
80204401938010110108000010008000830064251680108200800122000180000100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.5279

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)? int output thing (e9)? ldst retires (ed)? int retires (ef)
80025446838004111800301080203306687468021320802422018000010
80024422668001111800001080000306786728001020800002018000010
80024422288001111800001080000306791528001020800002018000010
80024422168001111800001080000306789288001020800002018000010
80024422088001111800001080000306785088001020800002018000010
80024422358001111800001080000306787268001020800002018000010
80024422258001111800001080000306789028001020800002018000010
80024422168001111800001080000306788578001020800002018000010
80024422028001111800001080000306788498001020800002018000010
80024422168001111800001080059306640168006920800722018000010