Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

LDRSW (literal)

Test 1: uops

Code:

  ldrsw x0, .+4

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 0.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch ldst uop (58)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)? int output thing (e9)? ldst retires (ed)
10049921001110001000115071000100011000
10047341001110001000110811000100011000
10047351001110001000114261000100011000
10047361001110001000113091000100011000
10047191001110001000113421000100011000
10047201001110001000111891000100011000
10047331001110001000111151000100011000
10047211001110001000112971000100011000
10047221001110001000113811000100011000
10047271001110001000113961000100011000

Test 2: throughput

Count: 8

Code:

  ldrsw x0, .+4
  ldrsw x0, .+4
  ldrsw x0, .+4
  ldrsw x0, .+4
  ldrsw x0, .+4
  ldrsw x0, .+4
  ldrsw x0, .+4
  ldrsw x0, .+4

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 0.5021

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)? int output thing (e9)? ldst retires (ed)? int retires (ef)
80204404108010110180000100800103022552748011020080014200180000100
80204402538010710180006100800113006428528011120080015200180000100
80204401718010110180000100800093006428578010920080012200180000100
80204401558010110180000100800093006428038010920080012200180000100
80204401718010110180000100800093006428398010920080012200180000100
80204401558010110180000100800083006418708010820080012200180000100
80204401698010110180000100800093006429478010920080012200180000100
80204402038010110180000100800093006422818010920080012200180000100
80204402038010110180000100800093006423718010920080012200180000100
80204402098010110180000100800093006423538010920080012200180000100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 0.5163

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)? int output thing (e9)? ldst retires (ed)? int retires (ef)
80024433948001511800041080000303806578001020800002018000010
80024413688001111800001080192305662158020220802282018000010
80024412878001111800001080000306611538001020800002018000010
80024413108001111800001080000306613768001020800002018000010
80024412958001111800001080000306619778001020800002018000010
80024413058001111800001080000306621218001020800002018000010
80024413008001111800001080000306621108001020800002018000010
80024413108001111800001080000306622098001020800002018000010
80024412918001111800001080000306621918001020800002018000010
80024412938001111800001080000306621728001020800002018000010