Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

PRFM (register, PSTL1STRM)

Test 1: uops

Code:

  prfm pstl1strm, [x6]
  mov x0, 0

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 0.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch ldst uop (58)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)
1004244210011100010003490610001000100011000
1004209710011100010003490610001000100011000
1004209710011100010003490610001000100011000
1004209210011100010003492610001000100011000
1004210110011100010003475810001000100011000
1004208610011100010003504810001000100011000
1004209810011100010003506010001000100011000
1004207410011100010003492610001000100011000
1004209710011100010003490610001000100011000
1004209710011100010003490610001000100011000

Test 2: throughput

Code:

  prfm pstl1strm, [x6]
  add x6, x6, 64

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 2.0105

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20204208262010110101100001010310006612893506332011710213100131020210002100011000010100
20204201352010110101100001010010000612563506232010010202100021020210002100011000010100
20204201352010110101100001010010000612563506232010010202100021020210002100011000010100
20204201352010110101100001010010000612563506232010010202100021020210002100011000010100
20204201352010110101100001010010000612563506232010010202100021020210002100011000010100
20204201352010110101100001010010037616103497182017810243100441020210002100011000010100
20204201352010110101100001010010000612563506232010010202100021020210002100011000010100
20204201352010110101100001010010000612413508292010410206100061020210002100011000010100
20204201292010510105100001011010000611883504392010610208100081020610006100021000010100
20204201352010110101100001010010000612563506232010010202100021020210002100011000010100

1000 unrolls and 10 iterations

Result (median cycles for code): 1.9734

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20024211482001310013100001001710000609793535612001010020100001002010000100011000010010
20024202952001110011100001001010000609933535852001010020100001002010000100011000010010
20024202952001110011100001001010000609933535852001010020100001002010000100011000010010
20024202952001110011100001001010000609933535852001010020100001002010000100011000010010
20024202952001110011100001001010000609933535852001010020100001002010000100011000010010
20024202952001110011100001001010000609933535852001010020100001002010000100011000010010
20024202952001110011100001001010000609933535852001010020100001002010000100011000010010
20024202952001110011100001001010000609933535852001010020100001002010000100011000010010
20024202952001110011100001001010000609933535852001010020100001002010000100011000010010
20024202952001110011100001001010000609933535852001010020100001002010000100011000010010

Test 3: throughput

Code:

  prfm pstl1strm, [x6]
  mov x7, 8

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 2.0503

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1020420938101011011000010010000300365120101002001000820010012110000100
1020420507101011011000010010000300361262101002001000820010014110000100
1020520535101311011003010010006300355534101062001001220010008110000100
1020420473101011011000010010002300356704101022001001220010008110000100
1020420466101011011000010010000300357086101002001000820010008110000100
1020420504101011011000010010000300356790101002001000420010004110000100
1020420503101011011000010010006300357346101062001001220010012110000100
1020420503101011011000010010006300357346101062001001220010004110000100
1020420484101011011000010010000300356934101002001000420010008110000100
1020420503101011011000010010000300356986101002001000420010008110000100

1000 unrolls and 10 iterations

Result (median cycles for code): 2.0061

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1002420056100111110000101000030349210100102010000201000011000010
1002420073100111110000101000030349214100102010000201000011000010
1002420071100111110000101000030350006100102010000201000011000010
1002420044100111110000101000030349214100102010000201000011000010
1002420048100111110000101000030349214100102010000201000011000010
1002420048100111110000101000030349214100102010000201000011000010
1002420083100111110000101000030349718100102010000201000011000010
1002420079100111110000101000030349200100102010000201000011000010
1002420048100111110000101000030349214100102010000201000011000010
1002420048100111110000101000030349214100102010000201000011000010