Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

PACDZA

Test 1: uops

Code:

  pacdza x0
  mov x0, 1

(requires arm64e binary, with arm64e_preview_abi boot arg)

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)? int output thing (e9)? int retires (ef)
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052825101110011000
1004602910011001100052725100010011000

Test 2: Latency 1->1

Code:

  pacdza x0
  mov x0, 1

(requires arm64e binary, with arm64e_preview_abi boot arg)

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 6.0029

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? simd retires (ee)? int retires (ef)
1020460029102011020110200530325102002002000101010010100
1020460029102011020110200530325102002002000101010010100
1020460029102011020110200530325102002002000101010010100
1020460029102011020110200530325102002002000101010010100
1020560058102041020410211530325102002002000101010010100
1020460029102011020110200530325102002002000101010010100
1020460029102011020110200530325102002002000101010010100
1020460029102011020110200530325102002002000101010010100
1020460029102011020110200530325102002002000101010010100
1020460029102011020110200530325102002002000101040010100

1000 unrolls and 10 iterations

Result (median cycles for code): 6.0029

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule simd uop (54)schedule ldst uop (55)dispatch int uop (56)dispatch simd uop (57)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)map ldst uop inputs (80)map simd uop inputs (81)? int output thing (e9)? ldst retires (ed)? simd retires (ee)? int retires (ef)
100246179810193101930010528053712910596202200101840010010
100246188310197101970010548053728810610222400101890010010
100246217910221102210010620053636410536204931410617991961493819285
100246192510201102010010560053743510620202000101870010010
100246209610219102190010610053652510548202000101870010010
100246210210217102170010608053652310550222000102110010010
100246192810203102030010562053667610562222100102110010010
100246188010198101980010549053729810608202400101890010010
100246205510217102170010600053743510620202200101880010010
100246205410215102150010598053651710548202300102120010010

Test 3: throughput

Count: 8

Code:

  pacdza x0
  pacdza x1
  pacdza x2
  pacdza x3
  pacdza x4
  pacdza x5
  pacdza x6
  pacdza x7

(requires arm64e binary, with arm64e_preview_abi boot arg)

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 2.0004

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
802041600308020180201802021360430802022002008010180100
802041600308020180201802021360481802022002008011180100
802041600308020180201802021360481802022002008010180100
802041600308020180201802021360573802202002008010180100
802041600308020180201802021360481802022002008011180100
802041602118025080250802751360481802022002008010180100
802041600308020180201802021360481802022002008010180100
802041600308020180201802021360481802022002008010180100
802051600648021080210802201360481802022002008010180100
802041600308020180201802021360580802202002008010180100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 2.0004

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)dispatch uop (78)map int uop (7c)map ldst uop (7d)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
8002416003080021800218002201359941080022200208001180010
8002516006480031800318004001359931080020200208001180010
8002416003080021800218002001359931080020200208001180010
8002416003080021800218002001360040080040200208001180010
8002416003080021800218002001359931080020200208001180010
8002416003080021800218002001359931080020200208001180010
8002416003080021800218002001359931080020200208001180010
8002416003080021800218002001360033080040200208001180010
8002416003080021800218002001359931080020200208001180010
8002416003080021800218002001359931080020200208001180010