Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

PRFM (register, PLIL1STRM)

Test 1: uops

Code:

  prfm plil1strm, [x6]
  mov x0, 0

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 0.001

Load/store unit issues: 1.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch ldst uop (58)simd uops in schedulers (5a)dispatch uop (78)map ldst uop (7d)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)
1004212210011100010003521610001000100011000
1004211910011100010003567810001000100011000
1004211310011100010003535810001000100011000
1004214610011100010003547210001000100011000
1004214410011100010003567410001000100011000
1004211910011100010003541010001000100011000
1004210310011100010003568610001000100011000
1004214310011100010003568610001000100011000
1004214310011100010003568610001000100011000
1004214310011100010003568610001000100011000

Test 2: throughput

Code:

  prfm plil1strm, [x6]
  add x6, x6, 64

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 1.9617

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20204211762010310103100001010510000607133589652010510207100071020710007100031000010100
20204206002010210102100001010610000607893590292010610208100081020810008100021000010100
20204206002010210102100001010610000607893590292010610208100081020810008100021000010100
20204206002010210102100001010610000607893590292010610208100081020810008100021000010100
20204203222010510105100001010610000611733529592010410206100061020410004100011000010100
20204203452010310103100001010810000610873551332010610208100081021010010100031000010100
20204203692010210102100001010610000609973559152010610208100081020810008100021000010100
20204206002010210102100001010610000607893590292010610208100081020810008100021000010100
20204206002010210102100001010610000607893590292010610208100081020810008100021000010100
20204206002010210102100001010610000607893590292010610208100081020810008100021000010100

1000 unrolls and 10 iterations

Result (median cycles for code): 2.0168

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
20024231502001110011100001001210000609923516152001010020100001002010000100011000010010
20024201922001110011100001001010000610723506332001010020100001005610036100351000010010
20024200682001110011100001001010000609643525332001010020100001002010000100011000010010
20024201622001110011100001001010000610023513392001010020100001002010000100011000010010
20024201622001110011100001001010000610723514812001010020100001002010000100011000010010
20024201142001110011100001001010000610723514632001010020100001002010000100011000010010
20024201612001110011100001001010000610033516452001010020100001002010000100011000010010
20024202332001110011100001001010000611533509392001010020100001002010000100011000010010
20024201482001110011100001001010000610703515172001010020100001002010000100011000010010
20024201792001110011100001001010000610663514992001010020100001002010000100011000010010

Test 3: throughput

Code:

  prfm plil1strm, [x6]
  mov x7, 8

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 2.0503

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1020519718101331031003010210006300325210101062001001420010012110000100
1020419623101011011000010010006300338820101062001001220010012110000100
1020418718101011011000010010006300325270101062001001220010012110000100
1020418718101011011000010010006300325270101062001001220210066210000100
1020419088101011011000010010000300334798101002001000820010012110000100
1020419442101011011000010010000300357430101002001000620010004110000100
1020420468101011011000010010000300356888101002001000820010012110000100
1020420427101011011000010010000300355474101002001000820010012110000100
1020420503101011011000010010006300357346101062001001220010012110000100
1020420503101011011000010010006300357346101062001001220010012110000100

1000 unrolls and 10 iterations

Result (median cycles for code): 2.0503

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? int retires (ef)
1002420803100111110000101000030363498100102010000201000611000010
1002420503100111110000101000030357322100102010000201000011000010
1002420503100111110000101000030357322100102010000201000011000010
1002420503100111110000101000030357322100102010000201000011000010
1002420503100111110000101000030357322100102010000201000011000010
1002420503100111110000101000030357322100102010000201000011000010
1002420503100111110000101000030357322100102010000201000011000010
1002420503100111110000101000030357322100102010000201000011000010
1002420503100111110000101000030357322100102010000201000011000010
1002420503100111110000101000030357322100102010000201000011000010