Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

PRFM (register, PLIL3KEEP)

Test 1: uops

Code:

  prfm plil3keep, [x6]
  mov x0, 0

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 0.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch ldst uop (58)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)
1004212910011100010003587810001000100011000
1004214310011100010003568610001000100011000
1004214310011100010003568610001000100011000
1004214310011100010003568610001000100011000
1004214310011100010003568610001000100011000
1004214310011100010003568610001000100011000
1004214310011100010003568610001000100011000
1004214310011100010003568610001000100011000
1004212010011100010003524410001000100011000
1004209610011100010003499810001000100011000

Test 2: throughput

Code:

  prfm plil3keep, [x6]
  add x6, x6, 64

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 1.9994

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20204205652010210102100001010110000613573519072010310205100051020610006100051000010100
20204201962010510105100001010410000612763519072010410206100061020610006100051000010100
20204201962010510105100001010410000612763519072010410206100061020610006100051000010100
20204201962010510105100001010410000612763519072010410206100061020610006100051000010100
20204201962010510105100001010410000612763519072010410206100061020610006100051000010100
20204201962010510105100001010410000612763519072010410206100061020610006100051000010100
20204201962010510105100001010410000612763519072010410206100061020610006100051000010100
20204201962010510105100001010410000612763519072010410206100061020610006100051000010100
20204201962010510105100001010410000612763519072010410206100061020610006100051000010100
20204201962010510105100001010410000612763519072010410206100061020610006100051000010100

1000 unrolls and 10 iterations

Result (median cycles for code): 2.0067

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20024232672001410014100001001910008612113496252003110033100141002010000100011000010010
20024200672001110011100001001010000610393495352001010020100001002010000100011000010010
20024200672001110011100001001010000610393495352001010020100001002010000100011000010010
20024200672001110011100001001010000610393495352001010020100001002010000100011000010010
20024201602001110011100001001010000611643486452001010020100001002010000100011000010010
20024200542001110011100001001010000610613499552001010020100001002010000100011000010010
20024200682001110011100001001010000609403516052001010020100001002010000100011000010010
20024200672001110011100001001010000610393495352001010020100001002010000100011000010010
20024200672001110011100001001010000610393495352001010020100001002010000100011000010010
20024200672001110011100001001010000610393495352001010020100001002010000100011000010010

Test 3: throughput

Code:

  prfm plil3keep, [x6]
  mov x7, 8

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 2.0503

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1020420422101011011000010010000300357272101002001000620010012110000100
1020420764101011011000010010000300359660101002001000820010071110000100
1020420485101011011000010010002300356174101022001001020010012110000100
1020420418101011011000010010000300357140101002001000820010008110000100
1020420381101011011000010010004300354798101042001001220010012110000100
1020420493101011011000010010000300357212101002001000820010012110000100
1020420498101011011000010010000300357370101002001000820010004110000100
1020420520101011011000010010000300357506101002001000420010004110000100
1020420485101011011000010010006300355834101062001001220010012110000100
1020420503101011011000010010006300357346101062001001220010012110000100

1000 unrolls and 10 iterations

Result (median cycles for code): 2.0905

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1002419554100111110000101000030334668100102010004221023421000010
1002419285100111110000101000030337062100102010004201000411000010
1002418883100111110000101000030331122100102010000201000011000010
1002418725100111110000101000030333646100102010000201000011000010
1002418726100111110000101000030334598100102010000201000011000010
1002419490100111110000101000030339568100102010000201000011000010
1002418734100111110000101000030336398100102010000201000011000010
1002419445100111110000101000030332468100102010000201000011000010
1002419253100111110000101000030337632100102010000201000011000010
1002419009100111110000101000030338418100102010000201000011000010