Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

PRFM (register, PLIL2STRM)

Test 1: uops

Code:

  prfm plil2strm, [x6]
  mov x0, 0

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 0.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch ldst uop (58)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)
1004212410011100010003568610001000100011000
1004214310011100010003568610001000100011000
1004214310011100010003568610001000100011000
1004214310011100010003568610001000100011000
1004214310011100010003568610001000100011000
1004211910011100010003547210001000100011000
1004211810011100010003541810001000100011000
1004211810011100010003546210001000100011000
1004211910011100010003568610001000100011000
1004214310011100010003568610001000100011000

Test 2: throughput

Code:

  prfm plil2strm, [x6]
  add x6, x6, 64

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 2.0215

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20204211382010210102100001010110006611843517932011710213100131021410014100061000010100
20204202152010610106100001011210006611493518432011810214100141021410014100061000010100
20204202152010610106100001011210006611493518432011810214100141021410014100061000010100
20204201362010310103100001010810000614473502692010410206100061020610006100021000010100
20204201182010610106100001011210006611493518432011810214100141021410014100061000010100
20204202152010610106100001011210006611493518432011810214100141021410014100061000010100
20204202152010610106100001011210006611493518432011810214100141021410014100061000010100
20204202152010610106100001011210006611493518432011810214100141021410014100061000010100
20204202152010610106100001011210006611493518432011810214100141021410014100061000010100
20204202152010610106100001011210006611493518432011810214100141021410014100061000010100

1000 unrolls and 10 iterations

Result (median cycles for code): 2.0056

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20024209672001610016100001002210006610753494572002810034100141002010000100011000010010
20024198962001110011100001001010000610193493992001010020100001002010000100011000010010
20024200562001110011100001001010000610193494212001010020100001002010000100011000010010
20024200562001110011100001001010000610193494212001010020100001002010000100011000010010
20024200562001110011100001001010000610193494212001010020100001002010000100011000010010
20024200562001110011100001001010000610193494212001010020100001002010000100011000010010
20024200562001110011100001001010000610193494212001010020100001002010000100011000010010
20024200562001110011100001001010000611523471692001010020100001002010000100011000010010
20024200152001110011100001001010000612143469112001010020100001002010000100011000010010
20024199692001110011100001001010000615063475372001010020100001002010000100011000010010

Test 3: throughput

Code:

  prfm plil2strm, [x6]
  mov x7, 8

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 2.0069

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1020420082101011011000010010000300349212101002001000820010008110000100
1020420079101011011000010010000300349778101002001000820010008110000100
1020420076101011011000010010000300349738101002001000820010008110000100
1020420072101011011000010010000300349468101002001000820010004110000100
1020420092101011011000010010000300349730101002001000820010004110000100
1020420075101011011000010010000300349464101002001000420010004110000100
1020420075101011011000010010000300349476101002001000420010008110000100
1020420073101011011000010010000300349612101002001000820010008110000100
1020420066101011011000010010006300349222101062001001220010012110000100
1020420058101011011000010010000300349442101002001000420010008110000100

1000 unrolls and 10 iterations

Result (median cycles for code): 2.0641

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1002419231100111110000101000030328922100102010004201000011000010
1002419380100111110000101000030333686100102010000201000011000010
1002418718100111110000101000030325246100102010000201000011000010
1002418718100111110000101000030325246100102010000201000011000010
1002418718100111110000101000030325246100102010000201000011000010
1002418815100111110000101000030325246100102010000201000011000010
1002419524100111110000101000030335714100102010000201000011000010
1002419203100111110000101000030332844100102010000201007211000010
1002419546100111110000101000030334208100102010000201000011000010
1002419424100111110000101000030326766100102010000201000011000010