Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

STLXR (32-bit)

Test 1: uops

Code:

  stlxr w0, w1, [x6]
  mov x0, 0

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 0.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch ldst uop (58)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)
1005315910191101810005185510001000200011000
1004304710011100010005185510001000200011000
1004304710011100010005185510001000200011000
1004304710011100010005185510001000200011000
1004304710011100010005185510001000200011000
1004304710011100010005185510001000200011000
1004304710011100010005185510001000200011000
1004304710011100010005185510001000200011000
1004304710011100010005185510001000200011000
1004304710011100010005185510001000200011000

Test 2: throughput

Code:

  stlxr w0, w1, [x6]
  add x6, x6, 8

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 3.1348

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule simd uop (54)schedule ldst uop (55)dispatch int uop (56)dispatch simd uop (57)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
2020931947203211023101009010230010003354733511812010610203100031020320006100041000010100
2986311636028971152733313665151073210002353923508762010310201100021020220004100031000010100
2020431322201031010301000010102010002354913510592010410202100021020320006100041000010100
2020431362201041010401000010103010002354913512172010410202100021020220004100031000010100
2020431339201031010301000010102010002354913512472010410202100021020220004100031000010100
2020431345201031010301000010102010002354913513202010410202100021020220004100031000010100
2020431343201031010301000010102010002354913512592010410202100021020220004100031000010100
2020431341201031010301000010102010002354913512452010410202100021020220004100031000010100
2020431347201031010301000010102010002354913513762010410202100021020220004100031000010100
2020431342201031010301000010102010002354913513112010410202100021020220004100031000010100

1000 unrolls and 10 iterations

Result (median cycles for code): 3.1437

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20029319802022810138100901013710002352443528092001410022100021002020000100011000010010
20024314542001110011100001001010000352353524382001010020100001002020000100011000010010
20024314442001110011100001001010000352353524702001010020100001002020000100011000010010
20024314402001110011100001001010000352353525212001010020100001002020000100011000010010
20024314262001110011100001001010000352353526412001010020100001002020000100011000010010
20024314412001110011100001001010000352353523462001010020100001002020000100011000010010
20024314422001110011100001001010000352353523102001010020100001002020000100011000010010
20024314432001110011100001001010000352353524502001010020100001002020000100011000010010
20024314302001110011100001001010000352353523432001010020100001002020000100011000010010
20024314132001110011100001001010000352353520952001010020100001002020000100011000010010

Test 3: throughput

Code:

  stlxr w0, w1, [x6]
  mov x7, 8

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 3.0040

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1020530397101191011001810010000300528855101002001000420020008110000100
1020430047101011011000010010000300528713101002001000420020008110000100
1020430040101011011000010010000300528713101002001000420020008110000100
1020430040101011011000010010000300528713101002001000420020008110000100
1020430040101011011000010010000300528713101002001000420020008110000100
1020430040101011011000010010000300528713101002001000420020008110000100
1020430040101011011000010010000300528713101002001000420020008110000100
1020430040101011011000010010000300528713101002001000420020008110000100
1020430096101011011000010010000300528713101002001000420020008110000100
1020430050101011011000010010000300528929101002001000420020008110000100

1000 unrolls and 10 iterations

Result (median cycles for code): 3.0040

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule simd uop (54)schedule ldst uop (55)dispatch int uop (56)dispatch simd uop (57)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
100253015310029110100181001000030528713100102010000202000011000010
100243004710011110100001001000030528713100102010000202000011000010
100243004010011110100001001007230529365100822010092202009611000010
100243004710011110100001001000030528713100102010000202000011000010
100243004010011110100001001000030528713100102010000202000011000010
100243004010011110100001001000030528713100102010000202000011000010
100243004010011110100001001000030528713100102010000202000011000010
100243004010011110100001001000030528713100102010000202000011000010
100243004010011110100001001000030528713100102010000202000011000010
100243004010011110100001001000030528713100102010000202000011000010