Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

STLXP (32-bit)

Test 1: uops

Code:

  stlxp w0, w1, w2, [x6]
  mov x0, 0

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 0.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch ldst uop (58)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)
1005330910191101810005185510001000300011000
1004304010011100010005185510001000300011000
1004304010011100010005185510001000300011000
1004305410011100010005185510001000300011000
1004304010011100010005185510001000300011000
1004304010011100010005185510001000300011000
1004304010011100010005185510001000300011000
1004304010011100010005185510001000300011000
1004304010011100010005185510001000300011000
1004304010011100010005185510001000300011000

Test 2: throughput

Code:

  stlxp w0, w1, w2, [x6]
  add x6, x6, 8

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 3.1324

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20209320312032110231100901023010003354733511092010610203100031020330012100021000010100
20204313542010310103100001010210002354913507082010410202100021020230006100031000010100
20204313582010310103100001010210002354913511802010410202100021020230006100031000010100
20204312932010310103100001010210002354913511202010410202100021020230006100031000010100
20204313502010310103100001010210033358283543682016610233100331020230006100031000010100
20204313442010310103100001010210002354913512662010410202100021020230006100031000010100
20204313552010310103100001010210002354913511732010410202100021020230006100031000010100
20204313522010310103100001010210002354913512902010410202100021020230006100031000010100
20204313442010310103100001010210002354913511442010410202100021020230006100031000010100
20204313362010310103100001010210002354913510702010410202100021020230006100031000010100

1000 unrolls and 10 iterations

Result (median cycles for code): 3.1426

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20029320072022810138100901013710002352443524762001410022100021002030000100011000010010
20024314122001110011100001001010000352353522852001010020100001002030000100011000010010
20024314292001110011100001001010000352353523632001010020100001005430093100341000010010
20024314322001110011100001001010000352353522682001010020100001002030000100011000010010
20024314282001110011100001001010000352353523602001010020100001002030000100011000010010
20024314242001110011100001001010000352353522592001010020100001002030000100011000010010
20024314302001110011100001001010000352353522342001010020100001002030000100011000010010
20024314242001110011100001001010000352353524532001010020100001002030000100011000010010
20024314282001110011100001001010000352353523112001010020100001002030000100011000010010
20024314332001110011100001001010000352353524992001010020100001002030000100011000010010

Test 3: throughput

Code:

  stlxp w0, w1, w2, [x6]
  mov x7, 8

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 3.0040

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1020530156101191011001810010000300528893101002001000420030012110000100
1020430047101011011000010010000300528713101002001000420030012110000100
1020430040101011011000010010000300528767101002001000420030012110000100
1020430047101011011000010010000300528713101002001000420030012110000100
1020430040101011011000010010000300528713101002001000420030012110000100
1020430049101011011000010010000300528713101002001000420030012110000100
1020430051101011011000010010000300529001101002001000420030012110000100
1020430040101011011000010010000300528713101002001000420030012110000100
1020430040101011011000010010000300528713101002001000420030012110000100
1020430040101011011000010010000300528713101002001000420030012110000100

1000 unrolls and 10 iterations

Result (median cycles for code): 3.0047

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1002530165100291110018101000030528855100102010004203000011000010
1002430047100111110000101000030528855100102010000203000011000010
1002430047100111110000101000030528855100102010000203000011000010
1002430047100111110000101000030528855100102010000203000011000010
1002430047100111110000101003630529179100462010048203000011000010
1002430096100111110000101000030528981100102010000203000011000010
1002430047100111110000101000030528855100102010000203000011000010
1002430047100111110000101000030528855100102010000203000011000010
1002430047100111110000101000030528855100102010000203000011000010
1002430047100111110000101003630529179100462010048203000011000010