Apple Microarchitecture Research by Dougall Johnson

M1/A14 P-core (Firestorm): Overview | Base Instructions | SIMD and FP Instructions
M1/A14 E-core (Icestorm):  Overview | Base Instructions | SIMD and FP Instructions

PACDB

Test 1: uops

Code:

  pacdb x0, x1
  mov x0, 1

(requires arm64e binary, with arm64e_preview_abi boot arg)

(no loop instructions)

1000 unrolls and 1 iteration

Retires: 1.000

Issues: 1.000

Integer unit issues: 1.001

Load/store unit issues: 0.000

SIMD/FP unit issues: 0.000

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)? int output thing (e9)? int retires (ef)
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000
1004602910011001100052725100010011000

Test 2: Latency 1->1

Code:

  pacdb x0, x1
  mov x0, 1

(requires arm64e binary, with arm64e_preview_abi boot arg)

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code): 6.0029

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)schedule ldst uop (55)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)map ldst uop inputs (80)? int output thing (e9)? ldst retires (ed)? simd retires (ee)? int retires (ef)
10205600581020610206010213530325102002002000101010010100
10204600291020110201010200530325102002002000101010010100
10204600291020110201010200530325102002002000101010010100
10204600291020110201010200530325102002002000101010010100
10204600291020110201010200530325102002002000101010010100
10204600291020110201010200530325102002002020101050010100
10204600291020110201010200530325102002002000101010010100
10204600291020110201010200530325102002002000101010010100
10204600291020110201010200530325102002002000101010010100
10204600291020110201010200530325102002002000101010010100

1000 unrolls and 10 iterations

Result (median cycles for code): 6.0029

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205298851003120201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010
10024600291002110021100205297851002020201001110010

Test 3: Latency 1->2

Chain cycles: 1

Code:

  add x1, x0, x0
  mov x0, 0
  pacdb x0, x1
  mov x0, 1

(requires arm64e binary, with arm64e_preview_abi boot arg)

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code, minus 1 chain cycle): 6.0029

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
302047002920201202012020214295622020210204202082010130100
302047002920201202012020214295682020210204202082010130100
302047002920201202012020214295682020210204202082010130100
302047002920201202012020214295682020210204202082010130100
302047002920201202012020214295682020210204202082010130100
302047002920201202012020214295682020210204202082010130100
302047002920201202012020214298392022710221202082010130100
302047002920201202012020214295682020210204202082010130100
302047002920201202012020214295682020210204202082010130100
302047002920201202012020214295682020210204202082010130100

1000 unrolls and 10 iterations

Result (median cycles for code, minus 1 chain cycle): 6.0029

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
300247002920021200212002214291982002010020200202001130010
300257005820025200252004714291982002010020200202001130010
300247002920021200212002014291982002010020200202001130010
300247002920021200212002014291982002010020200202001130010
300247002920021200212002014291982002010020200202001130010
300247002920021200212002014291982002010020200202001130010
300247002920021200212002014291982002010020200202001130010
300247002920021200212002014291982002010020200202001130010
300247002920021200212002014291982002010020200202001130010
300257005920025200252004614291982002010020200202001130010

Test 4: throughput

Count: 8

Code:

  pacdb x0, x8
  pacdb x1, x8
  pacdb x2, x8
  pacdb x3, x8
  pacdb x4, x8
  pacdb x5, x8
  pacdb x6, x8
  pacdb x7, x8

(requires arm64e binary, with arm64e_preview_abi boot arg)

(fused SUBS/B.cc loop)

100 unrolls and 100 iterations

Result (median cycles for code divided by count): 2.0004

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
802041600308020180201802021360379802022002008011180100
802041600308020180201802021360430802022002008010180100
802041600308020180201802021360481802022002008010180100
802041600308020180201802021360481802022002008010180100
802041600308020180201802021360481802022002008010180100
802051600648021080210802201360481802022002008010180100
802041600308020180201802021360481802022002008010180100
802041600308020180201802021360481802022002008010180100
802041600308020180201802021360481802022002008010180100
802041600308020180201802021360481802022002008010180100

1000 unrolls and 10 iterations

Result (median cycles for code divided by count): 2.0004

retire uop (01)cycle (02)schedule uop (52)schedule int uop (53)dispatch int uop (56)int uops in schedulers (59)dispatch uop (78)map int uop (7c)map int uop inputs (7f)? int output thing (e9)? int retires (ef)
8002416003080021800218002213599418002220208002180010
8002416003080021800218002013599318002020208001180010
8002416003080021800218002013599318002020208001180010
8002416003080021800218002013599318002020208001180010
8002516006480031800318004013600488004020208001180010
8002416003080021800218002013599318002020208001180010
8002416003080021800218002013599318002020208001180010
8002416003080021800218002013599318002020208001180010
8002516006480031800318004013599318002020208006380010
8002416003080021800218002013599318002020208001180010