Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

PRFM (register, PLIL1KEEP)

Test 1: uops

Code:

  prfm plil1keep, [x6]
  mov x0, 0

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 0.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch ldst uop (58)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)
1004205910011100010003413610001000100011000
1004205810011100010003413610001000100011000
1004205810011100010003413610001000100011000
1004205810011100010003413610001000100011000
1004205810011100010003413610001000100011000
1004205810011100010003413610001000100011000
1004205810011100010003413610001000100011000
1004205810011100010003413610001000100011000
1004205810011100010003413610001000100011000
1004205810011100010003527010001000100011000

Test 2: throughput

Code:

  prfm plil1keep, [x6]
  add x6, x6, 64

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 2.0011

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20204203612022810168100601017110000616973471302010310205100051021310013100061000010100
20204200732010510105100001010810000615653471092010310205100051021210012100051000010100
20204200282010210102100001010410004613943487812011410212100121020210002100011000010100
20204200492010510105100001011010004613943487272011410212100121021210012100051000010100
20204199802010110101100001010010000613983475372010010202100021021010010100051000010100
20204200092010510105100001011010000616563471812010410206100061021210012100051000010100
20204200232010510105100001011010000615883473212010010202100021021210012100051000010100
20204199642010510105100001011010004613943487452011410212100121020210002100011000010100
20204199782010110101100001010010000617303466252010210204100041021010010100031000010100
20204199182010310103100001010810006611413478552011810214100141021010010100051000010100

1000 unrolls and 10 iterations

Result (median cycles for code): 2.0131

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20024239072001110011100001001210004611193506352002310031100111002010000100011000010010
20024201312001110011100001001010000609923507012001010020100001002010000100011000010010
20024201312001110011100001001010000609923507012001010020100001002010000100011000010010
20024201312001110011100001001010000609923507012001010020100001002010000100011000010010
20024201312001110011100001001010000609923507012001010020100001002010000100011000010010
20024201312001110011100001001010000609923507012001010020100001002010000100011000010010
20024201312001110011100001001010000609923507012001010020100001002010000100011000010010
20024201312001110011100001001010000609923507012001010020100001002010000100011000010010
20024201312001110011100001001010000609923507012001010020100001002010000100011000010010
20024201312001110011100001001010000609923507012001010020100001002010000100011000010010

Test 3: throughput

Code:

  prfm plil1keep, [x6]
  mov x7, 8

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 2.0058

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1020420058101011011000010010006300349012101062001001220010012110000100
1020420058101011011000010010006300349160101062001001220010012110000100
1020420058101011011000010010006300349160101062001001220010012110000100
1020420058101011011000010010006300349160101062001001220010012110000100
1020420058101011011000010010006300349160101062001001220010012110000100
1020420058101011011000010010006300349160101062001001220010012110000100
1020420058101011011000010010006300349160101062001001220010012110000100
1020420058101011011000010010006300349160101062001001220010012110000100
1020420058101011011000010010006300349160101062001001220010012110000100
1020420058101011011000010010006300349160101062001001220010012110000100

1000 unrolls and 10 iterations

Result (median cycles for code): 2.0958

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1002420049100111110000101000630349222100162010012201000011000010
1002420048100111110000101000030349214100102010000201000011000010
1002420048100111110000101000030349214100102010000201000011000010
1002420075100111110000101000030349360100102010000201000011000010
1002420122100111110000101000030350082100102010000201000011000010
1002420103100111110000101000030349638100102010000201000011000010
1002420266100411110030101000030360786100102010006201000011000010
1002420751100111110000101000030362524100102010000201000011000010
1002420833100111110000101000030363238100102010000201000011000010
1002420748100111110000101009630366998101062010114201000011000010