Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

STLXR (64-bit)

Test 1: uops

Code:

  stlxr w0, x1, [x6]
  mov x0, 0

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 0.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch ldst uop (58)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)
1005315910191101810005185510001000200011000
1004304010011100010005171310001000200011000
1004304010011100010005171310001000200011000
1004304010011100010005171310001000200011000
1004304010011100010005171310001000200011000
1004304010011100010005171310001000200011000
1004304010011100010005171310001000200011000
1004304010011100010005171310001000200011000
1004304010011100010005171310001000200011000
1004304010011100010005171310001000200011000

Test 2: throughput

Code:

  stlxr w0, x1, [x6]
  add x6, x6, 16

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 3.2396

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20214336652053310353101801035210003354773653902010610203100031020320006100041000010100
20204323432010310103100001010210002354673636042010410202100021020220004100031000010100
20204323662010410104100001010310002354673636442010410202100021020220004100031000010100
20204323432010310103100001010210002354673637182010410202100021020220004100031000010100
20204323442010310103100001010210002354673638462010410202100021020220004100031000010100
20204323462010310103100001010210002354673637502010410202100021020220004100031000010100
20204323302010310103100001010210002354673636032010410202100021020220004100031000010100
20204323302010310103100001010210002354673637732010410202100021020220004100031000010100
20204323592010310103100001010210002354673646082010410202100021020220004100031000010100
20204323262010310103100001010210002354673638682010410202100021020220004100031000010100

1000 unrolls and 10 iterations

Result (median cycles for code): 3.2567

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20034338272044710267101801026610002352423671142001410022100021002020000100011000010010
20024325702001110011100001001010000352353666622001010020100001002020000100011000010010
20024325632001110011100001001010000352353663762001010020100001002020000100011000010010
20024325802001110011100001001010000352353667462001010020100001002020000100011000010010
20024325862001110011100001001010034355833693452007810054100341002020000100011000010010
20024325862001110011100001001010000352353665892001010020100001002020000100011000010010
20024325692001110011100001001010000352353663612001010020100001002020000100011000010010
20024325542001110011100001001010000352353673362001010020100001002020000100011000010010
20024325162001110011100001001010000352353656152001010020100001005220064100331000010010
20024324892001110011100001001010000352353661272001010020100001002020000100011000010010

Test 3: throughput

Code:

  stlxr w0, x1, [x6]
  mov x7, 8

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 3.0047

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1020530153101191011001810010000300528855101002001000420020008110000100
1020430098101011011000010010144321533949102502061018820020008110000100
1020430047101011011000010010000300528855101002001000420020008110000100
1020430047101011011000010010000300528855101002001000420020008110000100
1020430047101011011000010010000300528855101002001000420020008110000100
1020430047101011011000010010000300528855101002001000420020008110000100
1020430063101011011000010010000300529665101002001000420020008110000100
1020430047101011011000010010000300528855101002001000420020008110000100
1020430047101011011000010010036300529321101362001004820020008110000100
1020430047101011011000010010000300528855101002001000420020008110000100

1000 unrolls and 10 iterations

Result (median cycles for code): 3.0040

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1002530280100291110018101000030528927100102010004202000811000010
1002430047100111110000101000030528981100102010000202000011000010
1002430040100111110000101000030528855100102010000202000011000010
1002430040100111110000101000030528855100102010000202000011000010
1002430040100111110000101000030528855100102010000202000011000010
1002430040100111110000101000030528855100102010000202000011000010
1002430040100111110000101000030528855100102010000202000011000010
1002430040100111110000101000030528855100102010000202000011000010
1002430040100111110000101000030528855100102010000202000011000010
1002430040100111110000101000030528855100102010000202000011000010