Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

PRFM (register, PSTL3KEEP)

Test 1: uops

Code:

  prfm pstl3keep, [x6]
  mov x0, 0

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 0.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch ldst uop (58)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)
1004211510011100010003490610001000100011000
1004209710011100010003490610001000100011000
1004209710011100010003490610001000100011000
1004209710011100010003490610001000100011000
1004209710011100010003490610001000100011000
1004209710011100010003490610001000100011000
1004209710011100010003490610001000100011000
1004209710011100010003490610001000100011000
1004209710011100010003490610001000100011000
1004209710011100010003490610001000100011000

Test 2: throughput

Code:

  prfm pstl3keep, [x6]
  add x6, x6, 64

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 2.0143

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20204207832010310103100001010810000611393512872010110203100031020410004100011000010100
20204201352010110101100001010210000611563515012010210204100041020410004100011000010100
20204200892010110101100001010210000612003511512010610208100081020810008100021000010100
20204200862010110101100001010210000610963511052010210204100041020410004100011000010100
20204200882010110101100001010210000610823514292010210204100041020410004100011000010100
20204200912010110101100001010210000611563513752010210204100041021010010100031000010100
20204200792010110101100001010210000612463508532010210204100041020410004100011000010100
20204201392010110101100001010210004612233504292011410212100121020810008100021000010100
20204202062010210102100001010310000612843507272010310205100051020810008100021000010100
20204201202010210102100001010410000612963510672010410206100061020410004100011000010100

1000 unrolls and 10 iterations

Result (median cycles for code): 2.0201

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20024208512001210012100001001610000609163520572001510027100071002010000100011000010010
20024202012001110011100001001010000608753520932001010020100001002010000100011000010010
20024202012001110011100001001010000608753520932001010020100001002010000100011000010010
20024202012001110011100001001010000608753520932001010020100001002010000100011000010010
20024202012001110011100001001010000608753520932001010020100001002010000100011000010010
20024202012001110011100001001010000608753520932001010020100001002010000100011000010010
20024202012001110011100001001010000608753520932001010020100001002010000100011000010010
20024202012001110011100001001010000608753520932001010020100001002010000100011000010010
20024202012001110011100001001010000608753520932001010020100001002010000100011000010010
20024202012001110011100001001010000608753520932001010020100001002010000100011000010010

Test 3: throughput

Code:

  prfm pstl3keep, [x6]
  mov x7, 8

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 2.0503

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1020420504101011011000010010049300354611101492001006520010012110000100
1020420503101011011000010010006300357346101062001001220010012110000100
1020420503101011011000010010006300357346101062001001220010012110000100
1020420503101011011000010010006300357346101062001001220010012110000100
1020420507101011011000010010006300357346101062001001220010012110000100
1020420503101011011000010010006300357346101062001001220010012110000100
1020420503101011011000010010006300357346101062001001220010012110000100
1020420503101011011000010010006300357346101062001001220010012110000100
1020420503101011011000010010006300357346101062001001220010012110000100
1020420503101011011000010010006300357346101062001001220010012110000100

1000 unrolls and 10 iterations

Result (median cycles for code): 2.0503

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1002419506100111110000101000430334496100142010012201000011000010
1002419282100111110000101000030337250100102010000201000011000010
1002419266100111110000101000030335426100102010000201000011000010
1002419387100111110000101000030337810100102010000201000011000010
1002420472100111110000101000030356832100102010004201000011000010
1002420462100111110000101000030357084100102010000201000011000010
1002420503100111110000101000030357322100102010000201000011000010
1002420503100111110000101000030357322100102010000201000011000010
1002420503100111110000101000030357322100102010000201000011000010
1002420503100111110000101000030357322100102010000201000011000010