Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

PACIZB

Test 1: uops

Code:

  pacizb x0
  mov x0, 1

(requires arm64e binary, with arm64e_preview_abi boot arg)

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)dispatch ldst uop (58)int uops in schedulers (59)simd uops in schedulers (5a)ldst uops in schedulers (5b)dispatch uop (78)map int uop (7c)map ldst uop (7d)map simd uop (7e)? int output thing (e9)? int retires (ef)
1004602910011001100005272500100000010011000
1004602910011001100005272500100000010011000
1004602910011001100005272500100000010011000
1004602910011001100005272500100000010011000
1004602910011001100005272500100000010011000
1004602910011001100005272500100000010011000
1004602910011001100005272500100000010011000
1004602910011001100005272500100000010011000
1004602910011001100005272500100000010011000
1004602910011001100005272500100000010011000

Test 2: Latency 1->1

Code:

  pacizb x0
  mov x0, 1

(requires arm64e binary, with arm64e_preview_abi boot arg)

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 6.0029

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
10204600291020110201010200530325102002002001010110100
10204600291020110201010200530325102002002001010110100
10205600581020410204010211530325102002002001010110100
10204600291020110201010200530325102002002001010110100
10204600291020110201010200530325102002002001010110100
10204600291020110201010200530325102002002001010110100
10204600291020110201010200530325102002002001010110100
10204600291020110201010200530325102002002001010110100
10204600291020110201010200530325102002002001010110100
10204600291020110201010200530325102002002001010110100

1000 unrolls and 10 iterations

Result (median cycles for code): 6.0029

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010

Test 3: throughput

Count: 8

Code:

  pacizb x0
  pacizb x1
  pacizb x2
  pacizb x3
  pacizb x4
  pacizb x5
  pacizb x6
  pacizb x7

(requires arm64e binary, with arm64e_preview_abi boot arg)

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 2.0004

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
802041600308020180201802021360430802022002008010180100
802041600308020180201802021360481802022002008010180100
802041600308020180201802021360481802022002008010180100
802051600648021180211802201360481802022002008010180100
802041600308020180201802021360481802022002008010180100
802041600308020180201802021360481802022002008011180100
802041600308020180201802021360481802022002008010980100
802041600308020180201802021360481802022002008010180100
802041600308020180201802021360481802022002008010180100
802041600308020180201802021360481802022002008010180100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 2.0004

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)map ldst uop inputs (80)map simd uop inputs (81)? int output thing (e9)? ldst retires (ed)? simd retires (ee)? int retires (ef)
80024160030800218002180022135989080022202000800110080010
80024160030800218002180022135994180022202000800110080010
80024160030800218002180022135994180022202000800110080010
80024160030800218002180022135994180022202000800110080010
80024160030800218002180022135994180022202000800110080010
800241600308002180021800221359941800222028442595627430312983274959
80024160074800338003380040136020580059202000800360080010
80024160123800478004780058136009480040202000800230080010
80024160351801118011180150136049780093202000800110080010
80024160030800218002180020135993180020202000800110080010