Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

STLXRB

Test 1: uops

Code:

  stlxrb w0, w1, [x6]
  mov x0, 0

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 0.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch ldst uop (58)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)
1005316010191101810005185510001000200011000
1004304710011100010005171310001000200011000
1004304010011100010005171310001000200011000
1004304010011100010005171310001000200011000
1004304010011100010005171310001000200011000
1004304010011100010005171310001000200011000
1004304010011100010005171310001000200011000
1004304010011100010005171310001000200011000
1004304010011100010005171310001000200011000
1004304010011100010005171310001000200011000

Test 2: throughput

Code:

  stlxrb w0, w1, [x6]
  add x6, x6, 2

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 3.0424

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20206306812019410158100361015710003354773392772010610203100031020320006100041000010100
20204304402010410104100001010310003354773392212010610203100031023220064100331000010100
20204304142010310103100001010210002354633399022010310201100021020320006100041000010100
20204304542010310103100001010210002354913391402010410202100021020220004100031000010100
20204304162010310103100001010210002354913394052010410202100021023220064100331000010100
20204304622010310103100001010210002354913390252010410202100021020220004100031000010100
20204304292010310103100001010210002354913391792010410202100021020220004100031000010100
20204304032010310103100001010210002354913392222010410202100021020220004100031000010100
20204304172010310103100001010210002354913391632010410202100021020220004100031000010100
20204304282010310103100001010210002354913391932010410202100021020220004100031000010100

1000 unrolls and 10 iterations

Result (median cycles for code): 3.0442

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20026307042010210066100361006510002352663401752001410022100021008520128100641000010010
20024304652001210012100001001110000352593393462001010020100001002020000100011000010010
20027305412011210076100361007510000352593397132001010020100001002020000100011000010010
20024304362001110011100001001010000352593393412001010020100001005420066100351000010010
20024304342001110011100001001010000352593394192001010020100001002020000100011000010010
20024304572001110011100001001010000352593391192001010020100001002020000100011000010010
20024304302001110011100001001010000352593393372001010020100001005120060100321000010010
20024304332001110011100001001010000352593394482001010020100001002020000100011000010010
20025305042006110043100181004410000352593395952001010020100001002020000100011000010010
20024304342001110011100001001010062359363413822013410082100621002020000100011000010010

Test 3: throughput

Code:

  stlxrb w0, w1, [x6]
  mov x7, 8

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 3.0040

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1020530201101191011001810010000300528713101002001000420020008110000100
1020430040101011011000010010000300528713101002001000420020008110000100
1020430040101011011000010010000300528713101002001000420020008110000100
1020430040101011011000010010000300528713101002001000420020008110000100
1020430040101011011000010010000300528713101002001000420020008110000100
1020430040101011011000010010000300528713101002001000420020008110000100
1020430040101011011000010010000300528713101002001000420020008110000100
1020430040101011011000010010000300528713101002001000420020008110000100
1020430040101011011000010010000300528713101002001000420020008110000100
1020430090101011011000010010000300528857101002001000420020008110000100

1000 unrolls and 10 iterations

Result (median cycles for code): 3.0040

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1002530284100291110018101000030529755100102010004202000811000010
1002430044100111110000101000030529415100102010000202035211000010
1002430068100111110000101000030528713100102010000202000011000010
1002430040100111110000101000030528713100102010000202000011000010
1002430040100111110000101000030528713100102010000202000011000010
1002430040100111110000101000030528839100102010000202000011000010
1002430068100111110000101014430533645101542010176202000011000010
1002430040100111110000101000030528713100102010000202000011000010
1002430040100111110000101000030528713100102010000202000011000010
1002430040100111110000101000030528713100102010000202009611000010