Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

STLXP (64-bit)

Test 1: uops

Code:

  stlxp w0, x1, x2, [x6]
  mov x0, 0

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 0.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch ldst uop (58)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)
1005315910191101810005185510001000300011000
1004304710011100010005185510001000300011000
1004304710011100010005185510001000300011000
1004304710011100010005185510001000300011000
1004304710011100010005185510001000300011000
1004304710011100010005185510001000300011000
1004304710011100010005185510001000300011000
1004304710011100010005185510001000300011000
1005308710191101810005185510001000300011000
1004304710011100010005185510001000300011000

Test 2: throughput

Code:

  stlxp w0, x1, x2, [x6]
  add x6, x6, 16

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 3.2364

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20214339962053310353101801035210003354773652602010610203100031020330009100041000010100
20204323792010410104100001010310002354913643272010410202100021020230006100031000010100
20204323772010310103100001010210002354913641642010410202100021020230006100031000010100
20204324552010310103100001010210002354913641462010410202100021020230006100031000010100
20204323752010310103100001010210002354913643312010410202100021020230006100031000010100
20204323722010310103100001010210002354913643362010410202100021020230006100031000010100
20204323782010310103100001010210002354913640722010410202100021020230006100031000010100
20204323892010310103100001010210002354913642612010410202100021020230006100031000010100
20204323772010310103100001010210002354913640402010410202100021020230006100031000010100
20204323812010310103100001010210002354913641272010410202100021020230006100031000010100

1000 unrolls and 10 iterations

Result (median cycles for code): 3.2572

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20034363512044810268101801026710002352423672092001410022100021002030000100011000010010
20024326042001110011100001001010000352593667982001010020100001002030000100011000010010
20025326432006210044100181004310000352423666052001010020100001002030000100011000010010
20024325702001110011100001001010000352593668352001010020100001002030000100011000010010
20024325782001110011100001001010000352593666402001010020100001002030000100011000010010
20024325772001110011100001001010000352593667972001010020100001002030000100011000010010
20024325822001110011100001001010000352593666362001010020100001002030000100011000010010
20024325752001110011100001001010000352593667882001010020100001002030000100011000010010
20024325782001110011100001001010000352593667622001010020100001002030000100011000010010
20024325772001110011100001001010000352593667442001010020100001002030000100011000010010

Test 3: throughput

Code:

  stlxp w0, x1, x2, [x6]
  mov x7, 8

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 3.0047

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1020530518101191011001810010000300528855101002001000420030012110000100
1020430047101011011000010010000300528855101002001000420030012110000100
1020430040101011011000010010000300528855101002001000420030012110000100
1020430040101011011000010010000300528855101002001000420030012110000100
1020430040101011011000010010000300528855101002001000420030012110000100
1020430040101011011000010010000300530236101002001000620030012110000100
1020430048101011011000010010000300529017101002001000420030012110000100
1020430044101011011000010010000300528945101002001000420030012110000100
1020430040101011011000010010000300528855101002001000420030012110000100
1020430046101011011000010010000300528855101002001000420030012110000100

1000 unrolls and 10 iterations

Result (median cycles for code): 3.0047

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1002530162100291110018101000030528855100102010004203000011000010
1002430047100111110000101000030528855100102010000203000011000010
1002530087100291110018101000030528855100102010000203000011000010
1002430047100111110000101000030528855100102010000203000011000010
1002430047100111110000101000030528855100102010000203000011000010
1002430102100111110000101000030528855100102010000203000011000010
1002430055100111110000101000030528855100102010000203000011000010
1002430047100111110000101000030528855100102010000203000011000010
1002430047100111110000101000030528855100102010000203000011000010
1002430047100111110000101000030528855100102010000203000011000010