Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

PRFM (register, PLDL3STRM)

Test 1: uops

Code:

  prfm pldl3strm, [x6]
  mov x0, 0

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 0.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch ldst uop (58)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)
1004214910011100010003514210001000100011000
1004212310011100010003560610001000100011000
1004207410011100010003515810001000100011000
1004212310011100010003568610001000100011000
1004214310011100010003568610001000100011000
1004214310011100010003568610001000100011000
1004214310011100010003568610001000100011000
1004214310011100010003568610001000100011000
1004214310011100010003568610001000100011000
1004214310011100010003568610001000100011000

Test 2: throughput

Code:

  prfm pldl3strm, [x6]
  add x6, x6, 64

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 2.0026

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20204208972010210102100001010610000603623551532010110203100031020410004100011000010100
20204203842010110101100001010110000604073551732010210204100041020410004100011000010100
20204203842010110101100001010210000604073551732010210204100041020410004100011000010100
20204202372010110101100001010210000604073551732010210204100041020610006100021000010100
20204201882010110101100001010210000603193543632010210204100041020410004100011000010100
20204202292010110101100001010210006610373536292011810214100141020410004100011000010100
20204203842010110101100001010210000604073551732010210204100041020410004100011000010100
20204203762010110101100001010210000604073551732010210204100041020410004100011000010100
20204203452010310103100001010810008607573509512012110213100141021110011100061000010100
20204200772010610106100001010910004616943476592011410212100121021210012100051000010100

1000 unrolls and 10 iterations

Result (median cycles for code): 2.0240

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20024211562001110011100001001310000608203527312001010020100001002010000100011000010010
20024202402001110011100001001010000608203527312001010020100001002010000100011000010010
20024202402001110011100001001010000608203527312001010020100001002010000100011000010010
20024202402001110011100001001010000608203527312001010020100001002010000100011000010010
20024202402001110011100001001010000608203527312001010020100001002010000100011000010010
20024202402001110011100001001010000608203527312001010020100001002010000100011000010010
20024202402001110011100001001010000608203527312001010020100001002010000100011000010010
20024202402001110011100001001010000608203527312001010020100001002010000100011000010010
20024202402001110011100001001010000608203527312001010020100001002010000100011000010010
20024202402001110011100001001010000608203527312001010020100001002010000100011000010010

Test 3: throughput

Code:

  prfm pldl3strm, [x6]
  mov x7, 8

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 2.0818

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1020420889101011011000010010000300363866101002001000820010012110000100
1020420901101011011000010010000300360326101002001000820010008110000100
1020420663101011011000010010000300359274101002001000420010008110000100
1020421017101951051009010410000300360416101002001000820010012110000100
1020420719101011011000010010006300358574101062001001220010004110000100
1020520933101631031006010210000300363944101002001000820010004110000100
1020421034101311011003010010000300363598101002001000820010008110000100
1020420828101011011000010010098310367916102012041012020010004110000100
1020420862101011011000010010000300364956101002001000420010012110000100
1020420908101011011000010010000300363550101002001000420010008110000100

1000 unrolls and 10 iterations

Result (median cycles for code): 2.0048

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1002420047100111110000101000030349210100102010000201000011000010
1002420048100111110000101000030349214100102010000201000011000010
1002420048100111110000101000030349214100102010000201000011000010
1002420048100111110000101000030349424100102010000201000011000010
1002420048100111110000101000030349214100102010000201000011000010
1002420048100111110000101000030349214100102010000201000011000010
1002420048100111110000101000030349214100102010000201000011000010
1002420048100111110000101000030349214100102010000201000011000010
1002420048100111110000101000030349214100102010000201000011000010
1002420048100111110000101000030349214100102010000201000011000010