Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

PRFM (register, PLIL2KEEP)

Test 1: uops

Code:

  prfm plil2keep, [x6]
  mov x0, 0

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 0.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch ldst uop (58)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)
1004210010011100010003536410001000100011000
1004211710011100010003553810001000100011000
1004209310011100010003506810001000100011000
1004209010011100010003535610001000100011000
1004207410011100010003524610001000100011000
1004214610011100010003508410001000100011000
1004214310011100010003568610001000100011000
1004214310011100010003568610001000100011000
1004214310011100010003568610001000100011000
1004212110011100010003568610001000100011000

Test 2: throughput

Code:

  prfm plil2keep, [x6]
  add x6, x6, 64

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 2.0067

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20204208112010210102100001010410000613183492272010310205100051023810036100331000010100
20204200672010210102100001010310000613313484312010210204100041020610006100021000010100
20204200672010210102100001010410000613163492872010410206100061020610006100021000010100
20204200672010210102100001010410000613163492872010410206100061020610006100021000010100
20204200672010210102100001010410000613163492872010410206100061020610006100021000010100
20204200672010210102100001010410000613163492872010410206100061020610006100021000010100
20204200672010210102100001010410000613163492872010410206100061020610006100021000010100
20204200672010210102100001010410000613163492872010410206100061020610006100021000010100
20204200672010210102100001010410000613163492872010410206100061020610006100021000010100
20204200672010210102100001010410000613163492872010410206100061020610006100021000010100

1000 unrolls and 10 iterations

Result (median cycles for code): 2.0011

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20024205382001610016100001002110000610373497172001010020100001002010000100011000010010
20025201782007210042100301004110000610343505572001010020100001002010000100011000010010
20025202052007210042100301004110000607303525112001010020100001005810039100341000010010
20024200782001110011100001001010000611263504312001010020100001005610036100331000010010
20024201392001110011100001001010000609153511032001010020100001005410034100331000010010
20024201952001110011100001001010000610273508312001010020100001002010000100011000010010
20024201392001110011100001001010000610343505572001010020100001002010000100011000010010
20024201392001110011100001001010000610343505572001010020100001002010000100011000010010
20024201392001110011100001001010000603193504272001010020100001002010000100011000010010
20024201952001110011100001001010000610343505572001010020100001002010000100011000010010

Test 3: throughput

Code:

  prfm plil2keep, [x6]
  mov x7, 8

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 2.0503

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1020420509101011011000010010002300357182101022001001220010012110000100
1020420505101011011000010010000300357254101002001000420010004110000100
1020420503101011011000010010000300357200101002001000420010008110000100
1020420483101011011000010010000300357234101002001000420010008110000100
1020420478101011011000010010004300357102101042001001220010008110000100
1020420501101011011000010010000300357310101002001000420010012110000100
1020420475101011011000010010002300357198101022001001220010004110000100
1020420470101011011000010010000300357392101002001000820010012110000100
1020420515101011011000010010000300357168101002001000420010004110000100
1020420475101011011000010010144313363787102482041017220010008110000100

1000 unrolls and 10 iterations

Result (median cycles for code): 2.0888

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1002420061100111110000101000230349144100122010012201000011000010
1002419995100111110000101000030349136100102010000201000011000010
1002420058100111110000101000030349136100102010000201000011000010
1002420058100111110000101000030349136100102010000201000011000010
1002420058100111110000101000030349136100102010000201000011000010
1002420058100111110000101000030349136100102010000201000011000010
1002420058100111110000101000030349136100102010000201000011000010
1002419966100111110000101000030347048100102010000201001011000010
1002420708100111110000101000030355448100102010000201000011000010
1002420958100111110000101000030365426100102010000201000011000010