Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

PRFM (register, PLDL2KEEP)

Test 1: uops

Code:

  prfm pldl2keep, [x6]
  mov x0, 0

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 0.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch ldst uop (58)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)
1004206810011100010003412610001000100011000
1004205110011100010003412610001000100011000
1004205110011100010003412610001000100011000
1004205110011100010003412610001000100011000
1004205110011100010003412610001000100011000
1004205110011100010003412610001000100011000
1004205110011100010003412610001000100011000
1004205110011100010003412610001000100011000
1004205610011100010003412610001000100011000
1004205110011100010003412610001000100011000

Test 2: throughput

Code:

  prfm pldl2keep, [x6]
  add x6, x6, 64

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 2.0144

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20204211212010510105100001011010000614193508752010510207100071020710007100061000010100
20204201442010610106100001010510000614403509392010610208100081020810008100051000010100
20204201442010510105100001010610000614403509392010610208100081020810008100051000010100
20204201442010510105100001010610000614403509392010610208100081020810008100051000010100
20204201442010510105100001010610000614403509392010610208100081020810008100051000010100
20204201442010510105100001010610000614403509392010610208100081020810008100051000010100
20204201442010510105100001010610000614403509392010610208100081020810008100051000010100
20204201442010510105100001010610000614403509392010610208100081020810008100051000010100
20204201442010510105100001010610000614403509392010610208100081020810008100051000010100
20204201442010510105100001010610000614403509392010610208100081020810008100051000010100

1000 unrolls and 10 iterations

Result (median cycles for code): 2.0201

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20024207202001210012100001001510000609773529392001510027100071002010000100011000010010
20024202012001110011100001001010000608753520932001010020100001002010000100011000010010
20024202012001110011100001001010000608753520932001010020100001002010000100011000010010
20024202012001110011100001001010000608753520932001010020100001002010000100011000010010
20024202012001110011100001001010000608753520932001010020100001002010000100011000010010
20024202012001110011100001001010000608753520932001010020100001002010000100011000010010
20024202012001110011100001001010000608753520932001010020100001002010000100011000010010
20024202012001110011100001001010000608753520932001010020100001002010000100011000010010
20024202012001110011100001001010000608753520932001010020100001002010000100011000010010
20024202012001110011100001001010000608753520932001010020100001002010000100011000010010

Test 3: throughput

Code:

  prfm pldl2keep, [x6]
  mov x7, 8

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 1.9298

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1020420487101011011000010010006300357176101062001001220010012110000100
1020420473101011011000010010000300355804101002001000820010012110000100
1020420506101011011000010010000300357324101002001000420010012110000100
1020420446101011011000010010000300342400101002001000620010012110000100
1020418718101011011000010010006300325270101062001001220010012110000100
1020418718101011011000010010006300325270101062001001220010012110000100
1020418718101011011000010010006300325270101062001001220010012110000100
1020418718101011011000010010006300325270101062001001220010012110000100
1020418718101011011000010010006300325270101062001001220010012110000100
1020418718101011011000010010006300325270101062001001220010014110000100

1000 unrolls and 10 iterations

Result (median cycles for code): 1.8718

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1002419996100111110000101000630325212100162010012201000011000010
1002418718100111110000101000030325246100102010000201000011000010
1002418718100111110000101000030325246100102010000201000011000010
1002418718100111110000101000030325246100102010000201000011000010
1002418718100111110000101000030325246100102010000201000011000010
1002418718100111110000101000030325246100102010000201000011000010
1002418718100111110000101000030325246100102010000201000011000010
1002418718100111110000101000030325246100102010000201000011000010
1002418718100111110000101000030325246100102010000201000011000010
1002419335100111110000101000030336884100102010000201000011000010