Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

PACIZA

Test 1: uops

Code:

  paciza x0
  mov x0, 1

(requires arm64e binary, with arm64e_preview_abi boot arg)

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)? int output thing (e9)? int retires (ef)
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000

Test 2: Latency 1->1

Code:

  paciza x0
  mov x0, 1

(requires arm64e binary, with arm64e_preview_abi boot arg)

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 6.0029

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? simd retires (ee)? int retires (ef)
1020460029102011020110200530325102002002000101010010100
1020460029102011020110200530325102002002000101010010100
1020460029102011020110200530325102002002000101010010100
1020560058102041020410211530425102112002000101010010100
1020460029102011020110200530325102002002000101010010100
1020460029102011020110200530325102002002000101010010100
1020460029102011020110200530325102002002000101010010100
1020460029102011020110200530325102002002000101010010100
1020460029102011020110200530325102002002000101010010100
1020460029102011020110200530325102002002000101010010100

1000 unrolls and 10 iterations

Result (median cycles for code): 6.0029

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)map ldst uop inputs (80)map simd uop inputs (81)? int output thing (e9)? ldst retires (ed)? simd retires (ee)? int retires (ef)
100246002910021100211002052978510020202000100110010010
100246002910021100211002052978510020202000100110010010
100246002910021100211002052978510020202000100140010010
100246002910021100211002052978510020202000100110010010
100246002910021100211002052978510020202000100110010010
100246002910021100211002052978510020202000100110010010
100246002910021100211002052978510020202000100110010010
100246002910021100211002052978510020202000100110010010
100246002910021100211002052978510020202000100110010010
100246002910021100211002052978510020202000100110010010

Test 3: throughput

Count: 8

Code:

  paciza x0
  paciza x1
  paciza x2
  paciza x3
  paciza x4
  paciza x5
  paciza x6
  paciza x7

(requires arm64e binary, with arm64e_preview_abi boot arg)

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 2.0004

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
802041600308020180201802021360481802022002008010180100
802041600308020180201802021360481802022002008010180100
802041600308020180201802021360430802022002008010180100
802041600308020180201802021360481802022002008010180100
802041600308020180201802021360481802022002008011180100
802041600308020180201802021360481802022002008010180100
802041600308020180201802021360573802202002008010180100
802041600308020180201802021360481802022002008010180100
802041600308020180201802021360481802022002008010180100
802041600308020180201802021360481802022002008010180100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 2.0004

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule simd uop (54)schedule ldst uop (55)dispatch int uop (56)dispatch simd uop (57)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
8002416003080021800210080022013598908002220208001180010
8002416003080021800210080020013599318002020208001180010
8002416003080021800210080020013599318002020208001180010
8002416003080021800210080020013599318002020208001180010
8002416003080021800210080020013599318002020208001180010
8002416003080021800210080020013599318002020208001180010
8002416003080021800210080020013599318002020208001180010
8002416003080021800210080020013599318002020208001180010
8002416003080021800210080020013599318002020208001180010
8002416003080021800210080020013599318002020208001180010